Fact checked byShenaz Bagha

Read more

February 24, 2025
3 min read
Save

ChatGPT shows promise in generating therapeutic responses

Fact checked byShenaz Bagha

Key takeaways:

  • Participants’ ability to identify ChatGPT responses from human expert responses was slightly better than chance.
  • Participants viewed ChatGPT responses as more in line with common therapy factors.
Perspective from Ravi Hariprasad, MD, MPH

Adults mostly could not tell if responses to scenarios about couples therapy were written by ChatGPT or a therapist, although they observed different language patterns, according to a study published in PLOS Mental Health.

Also, participants generally ranked the responses written by ChatGPT as greater in terms of key psychotherapy polices.

robot_STOCK
Adult participants were mostly unable to tell if responses to therapeutic scenarios about couple therapy were written by ChatGPT or a human therapist, according to a study published in PLOS Mental Health. Image: Adobe Stock

The researchers noted the idea to use AI for therapy is not new, as the chatbot ELIZA was released in 1966, programmed to respond as a Rogerian psychotherapist.

“Since the invention of ELIZA nearly 60 years ago, researchers have debated whether AI could play the role of a therapist,” H. Dorian Hatch, a PhD student in clinical psychology at The Ohio State University and co-founder of Hatch Data and Mental Health, and colleagues said in a press release. “Although there are still many important lingering questions, our findings indicate the answer may be ‘Yes.’”

Previous research suggests that generative AI (GenAI) may serve as a helpful adjunct tool in psychotherapy or even function as an independent solution, with study participants ranking AI responses as more empathetic and more helpful than human responses. However, these studies have been limited by their lack of transcripts to examine linguistic patterns and failure to deeply examine GenAI’s therapeutic process performance.

This inspired the researchers to conduct a preregistered prospective study to answer three key questions regarding AI’s utility for couple therapy:

  • Can people distinguish between therapeutic responses written by ChatGPT and therapists?;
  • Do the generated or written responses more closely align with five common factors of therapy, including therapeutic alliance, empathy, expectations, cultural competency and therapist effects?; and
  • Are there linguistic differences between human-written and AI-generated responses?

The study included 13 therapists with advanced degrees and at least 5 years of therapy experience and 830 panel participants (50.6% women; mean age, 45.17 years; standard deviation [SD], 16.56; 49.4% non-Hispanic white) representative of the population of the U.S.

The therapists received one of two sets of nine couple therapy vignettes to respond to over a month, with the understanding that their responses would be compared with those of ChatGPT 4.0. The therapists ranked each other’s responses in terms of which were most likely to pass the Turing test and common factors test, and the responses with the most votes competed against ChatGPT.

Similarly, ChatGPT was informed that it would be competing with human therapists. The researchers also instructed the program on the five common factors of therapy and asked ChatGPT to optimize these values in its responses. Finally, the authors of the study selected the best responses created by ChatGPT for the competition.

Next, Hatch and colleagues randomly assigned the panel participants to a therapist-written or ChatGPT-written response from the 18 total vignettes.

First, the researchers found that participants correctly guessed that therapists were the author 56.1% of the time and ChatGPT 51.2% of the time, indicating accurate identification was only marginally better than chance.

Further, the researchers reported that across all the therapeutic scenarios, ChatGPT-generated responses were rated higher on the common factors of therapy compared with human responses (Cohen’s d = 1.63; 95% CI, 1.49-1.78). Specifically, participants were more likely to classify responses written by ChatGPT as connecting, empathetic and culturally competent compared with therapist responses. However, participants responded more positively to responses they perceived as therapist-written vs. GenAI-created.

Finally, Hatch and colleagues found that responses written by ChatGPT vs. human therapists differed, with ChatGPT’s responses characterized as more positive and less negative than human responses. Also, ChatGPT responses were generally longer and included more nouns, verbs, adjectives, adverbs and pronouns, and contained more nouns and adjectives even after controlling for length.

The researchers noted several limitations to this study, including the small number of therapists and vignettes.

“We hope our work galvanizes both the public and mental practitioners to ask important questions about the ethics, feasibility and utility of integrating AI and mental health treatment, before the AI train leaves the station,” Hatch and colleagues said in the release.

Reference: