Discussion about this post

User's avatar
Chase Hasbrouck's avatar

Hey Steven!

Brucks and Toubia have a good breakdown of methodological artifacts in prompting LLM's here:

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0319159

A bit old (GPT-4-2023), but given all other things being equal, first-option was preferred 63% of the time and B vs C was preferred 74% of the time. It's there, but not anything that would dominate the results you found.

Agree that there's really no way to know given our monitoring limitations.

Expand full comment
sol s⊙therland 🔸's avatar

Moderately misleading.

The post quotes alarming rates (e.g., “vast majority” deception) yet gives no sample size, temperature setting, or run-to-run variance. Without disclosure of prompts, seeds, and confidence intervals, we can’t judge reproducibility.

Expand full comment
11 more comments...

No posts