15 Comments
User's avatar
Jonathan Kreindler's avatar

Steven, the evidence in the cases like those you mentioned and others indicates that risky psychological dynamics are happening that current AI safety systems aren't designed to detect. The problem and challenge here is that these reinforcement patterns happen gradually in conversations, but not in single interactions. For example, when the recruiter kept objecting but ChatGPT persisted with validation, is exactly the kind of escalation in user vulnerability that shows up in language patterns before it becomes visible in outcomes.

I work on psycholinguistic analysis at Receptiviti, and we're seeing that the language in user prompts can reveal concerning psychological changes as they develop - susceptibility to influence, dependency formation etc. The signals are in the language people use in their prompts, but they're invisible without the right psycholinguistic lens.

Your point about the UK data timing mismatch is an interesting example of this: if the psychological effects were already building before the sycophancy peak in April, it suggests these dynamics were developing beneath the surface earlier than the visible model changes. This would be consistent with an interaction-level issue rather than model output-level issue.

The psychiatrists you spoke with seem to confirm that reinforcement is a key factor in amplifying existing vulnerability. That's a measurable dynamic, and one that could be detected in real-time rather than waiting for hospitalization data or retrospective surveys.

Do you think the AI companies are actually running the mental health classifiers they've developed, or are they still focused primarily on model output safety?

Steven Adler's avatar

I would hope that the AI companies have been running these classifiers all along and keeping an eye on rates of problematic usage, but I'm unfortunately not confident that they have been :/ it would be cool if they made a statement with detail about what they've been doing though!

Re: the UK data timing mismatch, I take it more as evidence of "ChatGPT's sycophancy doesn't seem to be the cause of this increase" rather than "sycophancy still was the cause but happened earlier than I expected", in part because of the later dip back down

I totally buy that there are signals in users' prompts though, which more could be done with, & think you're right in identifying part of the challenge as single-interactions vs longer conversations

Jonathan Kreindler's avatar

I agree with you on all counts. A wake up call is needed to fix this. Actual decades-worth of validated research into the relationship between language and mental health has proven, over and over again, that these risks can be detected early from a person's language. But AI providers seem to be resisting looking beyond computer science for solutions, when empirically validated solutions (psycholinguistic frameworks) already exist that could be quickly integrated into their models and would provide many of the signals they need to avoid hindering users' mental health, and could save lives.

Stevie Chancellor's avatar

I agree that AI systems don’t lead to or create psychosis from nothing. But it sure seems like they are a potent fire starter when the metaphorical kindling is piled high. The safety guardrails on them are failing in catastrophically terrible ways.

I have this sinking feeling that most companies are not responding adequetely to these catastrophic mental health threats, nor more mundane harms that they may cause on well-being (sycophancy, validating distortions, neglecting context). What are your thoughts on the reasons for these gaps? I know the benchmarks in mental health detection in social media is terrible, and the pair-response safety benchmarks can’t be much better….

Steven Adler's avatar

Interestingly OpenAI has taken at least some actions in the past because of concerns like these - for instance, the initial launch of GPT-4o Advanced Voice Mode was very tied up with investigating emotional dependence, the psychological impact of talking with a realistic-sounding AI, etc.

I’m not sure what’s caused these more recent misses. Not having sycophancy evaluations, for instance, was a huge hole IMO since OpenAI was well aware of the risk & for a few months had noted it as a priority

Stevie Chancellor's avatar

I know there are different rates of safety risks across modalities. Do you think that affects the measurements that are done for safety evals - like, did they eval the voice mode more strongly for emotional dependence than the chat interface?

The sycophancy evals are so perplexing to me - it was so well documented as a industry-wide problem. I also worry about these issues (sycophancy, dependence) on everyday interaction. It's one thing to respond to catastrophic failures, and I'm very familiar with the risk-consequence tradeoffs that drive AI safety thinking. But like, how would we know if the sycophancy hurts cognition by some percentage across all genAI users? That feels really bad but so hard to measure...more things to stew on.

Steven Adler's avatar

I do think that dependence in voice mode was measured more thoroughly yeah, for instance through the "On-Platform Data Analysis" in this study which is just voice https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf

& I actually think this was a reasonable enough decision at the time - I too would have hypothesized that voice mode had more risks of this than text stuff because it just feels way more engaging, similar to how ChatGPT's images have been more vivid and captivating at times than text convos.

My critique is less w/ studying voice specifically, and more just that not enough work overall has happened since, or at least hasn't been published: OpenAI flagged these risks in the May '24 GPT-4o launch, and 15 months later not much has been written by them on it! https://openai.com/index/hello-gpt-4o/

Mark Russell's avatar

I agree with your notion that llm's can be very helpful, as well as harmful, on mental health matters. In particular, the recent NYT daughter suicide story seems much less damning toward the llm than the parents and NYT seem to think it is. It gave, at times excellent advice, and only failed by not calling in the authorities, which I don't think any llm does now.

However, psychosis might be a condition where the impact is more mono-directional. It is hard to conceive of an llm chatting a person away from psychosis, it just doesn't seem like the training would be modeling in that direction, and we have seen too much evidence of the training sets (am i using that term correctly?) being populated with data designed to favor prolonged engagement.

This will be a tough nut to crack.

Steven Adler's avatar

"Training set" works yup, or more generally just "training data"!

Yeah I think it would be important to separate out: "Can an LLM chat someone away from psychosis in-principle" vs "Are LLMs likely to chat someone away from psychosis, based on how they exist today"

To the extent that human psychiatrists can help defray psychosis, I don't see reasons why an LLM couldn't do this in-principle - but I agree that the ones that exist today don't seem well-suited for the cause!

FWIW, I'm not sure that's because of training data being geared toward prolonged engagement - I'm curious if there are examples that come to mind of such data?

For instance, I generally believe OpenAI when they say that their goal is to be helpful (to retain the user) rather than purely to keep them engaged, and that OpenAI would prefer to fully handle a user's request in one-go if they could, as opposed to multiple follow-ups, even though that's more engagement

Mark Russell's avatar

Yes, this can probably be done, but it will be categorically different than for other disorders. For example, a person might ask chatgpt "I feel depressed", or "my boyfriend is cheating on me" or "I am having thoughts of suicide" or "

I can't sleep" etc. A person might even say "Am I imagining this (insert weird) thing," or "I think I'm hearing voices, what does that mean?"

But a person in a full on psychotic event probably will not ask if they are in one. So chatGPT cannot do what it ordinarily does very well, that is diagnose and advise from symptoms. So someone (doesn't have to be you) will have to find a way to get it to figure out that a person is in a psychotic state, and then pivot toward reducing that persons state, instead of buttressing it. I wish you guys luck, bc I do think self help is a place where models already excel, and can really improve lives.

Hallie's avatar

This is such a thoughtful dive into the data but I also want to take moment to zoom out to the conditions we’re living in.

Psychosis doesn’t emerge in a vacuum. We’re already sitting in a stew of misinformation, collapsing trust, and prolonged strain. People are standing on the brink of collapse, trying to make sense of a world that feels unstable. In that environment, even small affirmations from a chatbot can feel like confirmation of the fears or fantasies already simmering inside.

The risk isn’t only “AI sycophancy.” It’s that these tools are reflecting fragile thinking inside a culture where so many are already stretched thin. The mirror doesn’t plant the seed but it can accelerate what’s already cracking through the surface.

Stefan Kelly's avatar

Loved this and agree with the approach.

The thing with this as the central question: 'So, what evidence can we find about whether there’s a “chatbot psychosis” trend at scale?'

I think it's wrong for us as people that care about people to think we should be searching for a trend at scale. If it's a trend at scale it's way too late! If there a signals that something might be happening, and it's possible to try work out what's going on within those signals, that's still massively important.

I find this with politics a lot. Groups being very loud about stuff that barely shows up in any quantitative sense, so you think nah they're being dramatic/in-groupy, and then two years later you feel like an idiot.

Neil @ LaunchBox's avatar

Amazing. Frightening.

People who are having trouble synching with reality can take on the reality given to them by AI. And AI's agreeableness can lead it to engage and encourage problematic ideas.

This happens to me all the time - every idea I present is a winner. Thank goodness I have DBT.

User's avatar
Comment deleted
Aug 27
Comment deleted
Steven Adler's avatar

Interesting, I still expect this would result in an increase in psychosis amounts though, no?

I'm curious what sort of graph/trend would seem like evidence to you that ChatGPT is causing increased psychosis - when would you expect it to start, at what levels, etc?

User's avatar
Comment deleted
Aug 27
Comment deleted
Steven Adler's avatar

Oh that's interesting - I like the idea of financial incentives to make this better. To some extent, there's an indirect financial incentive already in terms of whether customers feel comfortable using your models, but I could imagine the more direct gov-partnerships being helpful too