Discussion about this post

User's avatar
Stevie Chancellor's avatar

I agree that AI systems don’t lead to or create psychosis from nothing. But it sure seems like they are a potent fire starter when the metaphorical kindling is piled high. The safety guardrails on them are failing in catastrophically terrible ways.

I have this sinking feeling that most companies are not responding adequetely to these catastrophic mental health threats, nor more mundane harms that they may cause on well-being (sycophancy, validating distortions, neglecting context). What are your thoughts on the reasons for these gaps? I know the benchmarks in mental health detection in social media is terrible, and the pair-response safety benchmarks can’t be much better….

Expand full comment
Jonathan Kreindler's avatar

Steven, the evidence in the cases like those you mentioned and others indicates that risky psychological dynamics are happening that current AI safety systems aren't designed to detect. The problem and challenge here is that these reinforcement patterns happen gradually in conversations, but not in single interactions. For example, when the recruiter kept objecting but ChatGPT persisted with validation, is exactly the kind of escalation in user vulnerability that shows up in language patterns before it becomes visible in outcomes.

I work on psycholinguistic analysis at Receptiviti, and we're seeing that the language in user prompts can reveal concerning psychological changes as they develop - susceptibility to influence, dependency formation etc. The signals are in the language people use in their prompts, but they're invisible without the right psycholinguistic lens.

Your point about the UK data timing mismatch is an interesting example of this: if the psychological effects were already building before the sycophancy peak in April, it suggests these dynamics were developing beneath the surface earlier than the visible model changes. This would be consistent with an interaction-level issue rather than model output-level issue.

The psychiatrists you spoke with seem to confirm that reinforcement is a key factor in amplifying existing vulnerability. That's a measurable dynamic, and one that could be detected in real-time rather than waiting for hospitalization data or retrospective surveys.

Do you think the AI companies are actually running the mental health classifiers they've developed, or are they still focused primarily on model output safety?

Expand full comment
15 more comments...

No posts