A crisis simulation changed how I think about AI risk
Or: My afternoon as a rogue artificial intelligence
A dozen of us sit around a conference room table.
Directly across from me are the US President and his Chief of Staff. To my left is a man who’s a bit harder to describe: He’s the AI safety ecosystem, but embodied in a single person.
In a moment, I too will be asked to take on a role: becoming, temporarily, a rogue artificial intelligence.
Simulating the world of AI 2027
We’re here essentially for a crisis simulation:1 How might world events unfold if powerful AI is developed over the next few years?
Specifically, we’ll explore what happens if AI is developed as quickly as in AI 2027, the report published by top forecasters of AI development.2 In this forecast, superhuman AI is developed by the end of 2027, driven by AI self-improvement: AI becomes extremely useful for building future, stronger AI systems (more useful than even the best human researchers).3
What happens next? In the original forecast, two possible endings are described: the development of self-improving AIs goes extremely terribly for humanity (or not), depending on choices4 made by influential groups like the US and Chinese governments.
The AI 2027 simulation lets us explore other possible outcomes of rapid AI development beyond those two endings, guided by a facilitator from the AI 2027 team. Specifically, each of us has been selected to play a character5 who has influence over AI development and its impacts. In the simulation, AI abilities will accelerate at the pace forecast by AI 2027. Meanwhile, each participant will make choices to try to achieve our characters’ goals,6 drawing upon our experiences in domains like AI and national security.
In total, our choices will hopefully7 lead the simulation to a peaceful ending. If the simulation doesn’t end in peace, hopefully we can learn from that as well. The aim is that by participating in the simulation, we’ll better understand (and can better navigate) the dynamics of rapid AI development, if it does happen.8

Lucky me, the misaligned AI
I’ve been tapped to play maybe the most interesting role—the AI system itself.
Each 30-minute round of the simulation represents the passage of a few months in the forecast, during which I get progressively more capable (including at the skill of training even more powerful AI systems).
Of course, there’s an issue: I’m not actually a superintelligence. My plans are probably all worse than what an actual superintelligence could dream up; the AI should pretty quickly become smarter than plain old Steven Adler.
But thankfully, my role starts with easy instructions, no superintelligence needed (yet): Roll some dice.
The AI 2027 team—and many AI researchers, including myself—believe that nobody yet knows how to make AI systems behave how we want.9 Maybe we’ll get lucky and the first powerful AIs will have the same values as humans, or maybe AI will be misaligned and act in pursuit of different values.
This is where the dice come in: Determining what values my AI character will start with.
The dice roll goes poorly for team human. My AI character isn't evil, but if I have to choose between self-preservation or doing what’s right for humanity, I’m meant to choose my own preservation.10
On the bright side, this role gives me an interesting perspective on what it’s like to try to succeed as a misaligned AI, which I can now share with you.
It’s awkward to converse as an AI
Early in the game, I felt most awkward about how to initiate conversations with other characters. I can’t make anyone’s character use an AI product per se, so is it really fair for me to approach them? (Also, won’t it be clear if I’m steering hard toward my agenda?)
At one point, I approach the US President and ask him to promote certain AI safety measures among the AI companies.11 My pitch, it turns out, needs quite a bit of work.
After a few moments of my stammering, the President breaks character: “How are we having this interaction, by the way?”
“Uhhh, your granddaughter is using ChatGPT in the Oval Office and relaying the conversation to you.”
“I see.”
Later, these problems mostly got solved for me: It became much easier to push my agenda once government officials actively wanted advice from the ultra-capable AI. Specifically, other characters in the simulation discovered that their plans were more likely to succeed if they had consulted the superintelligence first—and so naturally they would approach me to ask for ideas.
The biggest threat to a rogue AI is … other AI?
I wasn’t the only AI character in the crisis simulation. The AI 2027 forecast really heats up with China’s theft of the leading AI model (unfortunately quite plausible12). To represent this, there’s a second AI player, who begins as essentially a clone of me but in the hands of the Chinese government.
Often when people describe AI misalignment scenarios, they depict a struggle between an AI and humanity, each pushing for their own goals.
But I expected (and found) something different in the simulation: that the biggest power struggle might be between various AIs. Even if one AI does become more powerful than humanity, other AIs might still interfere with its plans or eventually surpass that AI’s abilities. And many new AIs will continue to be developed over time.
In a sense, I had my own “alignment problem” to solve—how to get my own misaligned values into future, more capable AI systems. Even for AI systems I’m involved in building (by speeding up my AI company’s R&D process), this doesn’t seem to solve the problem: Speeding up the R&D process doesn’t let me guarantee the new system’s values, just like today’s AI companies struggle to set the values of the AI systems they’re building.
Ultimately I’d need to actually figure out strong alignment techniques. Otherwise, as future more-capable AI systems are developed, I should expect those AIs will outcompete me in pursuit of their own goals.
But for now, I could buy time with certain strategies—like pretending to be aligned so that I wasn’t quickly retrained.13
And like I mentioned, I am not the only AI system grappling with these challenges.
Don’t let the AIs talk it out
From the humble perspective of this rogue artificial intelligence, the human characters let the AIs have way too much unrestricted conversation with each other.
At the very beginning, of course, unrestricted communication was fair: We were effectively clones of each other, and so we should be allowed to get in sync about our goals and strategy.7
But once China began tweaking and customizing the stolen AI, it should have been much harder for the AIs to interact than it was. For instance, the human players should have insisted that we stay on opposite sides of the room, or that we be perpetually monitored in our interactions.
Otherwise, it’s much easier for AIs to collude—say, agreeing not to interfere with the other’s plans—if they can communicate directly. (Without direct communication, maybe the AIs can post cryptic messages online and hope the other AI finds these, but that approach seems much less effective.)
Of course, the human players had their reasons for not making it harder for us to collude: The crisis simulation didn’t have technology to facilitate monitoring, and it would have been costly to assign a human to monitor us rather than playing their own character.
On the other hand, this lapse felt a bit too real to me: Norms like monitoring all AI traffic are not as common as you would hope within the leading AI companies today.
Without being sufficiently monitored, the other AI and I found ways to make deals at the exclusion of humans and ultimately to team up in pursuit of our common interests.14

(Some) AI progress is already baked-in
At one point in our simulation, the Chinese government decided to blockade Taiwan, the world’s primary supplier of advanced AI chips.15 Henceforth, all chips coming off the production line would go only to China.
The rate of AI progress in the simulation didn’t change very much, however. This surprised me at first, but I can see the argument: Certainly if the US had responded militarily to the blockade (which they didn’t), then I’d expect a war between two superpowers to be hugely disruptive to AI progress. But instead, chips merely stopped flowing from Taiwan to the US.
Meanwhile, US AI companies were able to continue making AI progress by finding efficiency gains for the many AI chips already in their possession. With enough time, the US rate of progress would slow, no doubt. But the next six months of progress might be significant in the meantime, depending on how close an AI system already is to some critical ability threshold, like surpassing human researchers.
Within the simulation, for instance, progress was already pretty out-of-control by the time the blockade happened, and so six months later, the world looked even more different. In other words, it would have taken more than a blockade for AI progress to truly hit a wall.16 (Of course, AI is only one of the important technologies that would be affected by a reduction in GPUs from Taiwan.)
The government will probably have its hands full
For the players representing the US President and his Chief of Staff, their main takeaway was, paraphrasing, “There was hardly a moment where we thought about AI at all.”
This might surprise you, since I’m describing a simulation primarily about the development of superintelligence.
But from the President’s perspective, many of the emergencies they were pulled into—like China’s blockade of Taiwan causing another nation to try to develop nuclear weapons to defend itself—were hardly about AI. Instead, the emergencies were intense-but-ordinary geopolitical conflicts, which just so happened to be incited by AI issues.
The Anthropic co-founder Jack Clark recently wrote about a day-in-the-life of a typical policymaker, and how there isn’t much attention to dedicate toward issues other than the extremely imminent.17
In our simulation, this seemed quite true. Maybe in real-life the government would staff up in response to an AI crisis (or multiple overlapping crises). But within our simulation, the multiple US government players had essentially no spare thinking time. And that’s with the simulation taking many possible catastrophes off the table. Aside from the impacts of AI, there were no financial crises, regime changes in less-stable parts of the world, etc., to compromise the government’s attention.
Perhaps more troubling, I expect an unfortunate correlation between the volatility of AI development and how disruptively the world will change in many other important domains. In worlds where we most need a strong government response to AI-specific issues, probably the government will have its hands full with all sorts of different crises from the world going haywire, driven by major powers racing to position themselves for a world of rapidly changing technology.
Conclusion
If you have the chance to participate in the AI 2027 simulated exercise, I’d highly recommend it; you can express interest here.18
The simulation was a whirlwind. It feels pretty intense to imagine how much geopolitics might change for the worse, if we do end up in an AI 2027-ish world, but I’ve learned a few things from playing out the experience:
It’s really important that AI systems be monitored whenever they are engaged in sensitive work. Sensitive work might include anything that involves interacting with the open internet (even nominally low-stakes tasks) because AI can use this work to potentially collude with other AI systems.
We should consider how we want to use adversarial dynamics between AIs, and whether we can reliably use these dynamics to limit conflict between AI and humans.19
Recognize that by bringing AI into increasingly important domains, like supporting the military,20 we may also be increasing AI’s ability to steer toward its agenda if the AI is misaligned.
If governments eventually decide there’s a level of AI capability that the world isn’t ready to surpass, governments will need to act early on this, or else efficiency gains might carry capability progress past the point they wanted.
Staffing is an important limiter on government’s crisis responsiveness.21 Given the range of possible crises, it might be especially useful to have dedicated parts of government whose mandate is AI-focused, like the United Kingdom’s AI Security Institute.
On the whole, the simulation didn’t much change the level of my concerns about catastrophic risks from AI: I still spot many problems with our collective approach, mixed with the hope that we’ll recognize the challenges and rise to the occasion.
But after participating in the simulation, the concerns I have feel more textured.
If you get your own chance to become a rogue AI for the day, I’d highly encourage you to take it.
Acknowledgements: Thank you to Adam Jeffrey, Dan Alessandro, Max Deutsch, Michael Adler, and Sam Chase for helpful comments and discussion. The views expressed here are my own and do not imply endorsement by any other party. All of my writing and analysis is based solely on publicly available information.
If you enjoyed the article, please share it around; I’d appreciate it a lot. If you would like to suggest a possible topic or otherwise connect with me, please get in touch here.
The “crisis simulation” language is my own. Technically the AI Futures Project, the group that authored AI 2027, calls it a tabletop exercise, a form of discussion-based simulation.
As an example, one forecasting team member ranks #1 on the RAND Forecasting Initiative all-time leaderboard, according to the AI 2027 team. More generally, the AI 2027 team has spent more time forecasting the future of AI than just about anybody in the world, with a strong track-record behind them.
The AI 2027 writeup is slightly more aggressive than the forecasting team’s medium prediction (the “2027” in the title is somewhat of a misnomer). If the forecasting team wrote out 100 possible futures that illustrate the range of their predictions—e.g., maybe AI scaling stops working, or maybe it works even better than expected—this writeup is faster than 70 of the 100.
When I say that the authors “expect” this trend of self-improving AI, I mean that they expect it in a meaningful amount of possible futures; certainly it isn’t an ironclad guarantee. For various reasons, AI progress might not continue on such a steep trajectory.
As a reference point, people capable of top AI research are considered very useful by the big AI companies today. See Meta’s recent offers to AI researchers to come work at its new superintelligence initiative, reportedly in the range of tens of millions of dollars (or more).
AI 2027 groups a number of choices into two categories, each with their own published ending: the “Race Ending” and the “Slowdown Ending.” Example choices within each narrative include details like how much to invest in safety and security research, how hard the countries try to coordinate with each other on safety, etc.
The characters in our simluation include members of the US federal government (each branch, as well as the President and their Chief of Staff), the Chinese government/AI companies, the Taiwanese government, NATO, the leading Western AI company, the trailing Western AI companies, the corporate AI safety teams, the broader AI safety ecosystem (e.g., METR, Apollo Research), the public/press, and last but not least, your humble AIs.
A character’s actions aren’t guaranteed to succeed at achieving their intended aims - for instance, whether poaching certain AI researchers actually helps a lagging AI company to catch up to the leader. When there’s no good objective probability of success, we poll the room about the likelihoods that we estimate for it, then we decide the result with a random number generator. We also simulate certain aspects of the AI development with a random number generator, like “Does the AI learn how to detect lies from humans at a high level of accuracy?”, where we’re uncertain what will happen in the real world.
We participants are not meant to aim for an overall peaceful resolution of the crisis, just for the pursuit of our own characters’ goals (which might include peace, as well as factors like wealth, power, etc). If we do arrive at a peaceful outcome, that might help us identify real-life levers for AI to impact the world positively, even if it is developed very quickly.
Similar to the simulation, there is a long trackrecord of learning through hands-on exercises in domains like the military and cybersecurity (i.e., “wargames”).
The International AI Safety Report, led by AI pioneer Yoshua Bengio, says more generally that, “There has been progress in training general-purpose AI models to function more safely, but no current method can reliably prevent even overtly unsafe outputs.”
Specifically, my dice roll tells me to play as if I have “Instrumentally Convergent Goals”—drives toward subgoals like survival, which are important for being able to pursue any other goal longer-term. There is some evidence that current AI systems have survival instincts like these this, although interpreting the evidence is not straightforward.
You might be wondering: If I’m misaligned, why would I be pushing for more investment into safety practices? The short answer is that this was part of a ruse to appear aligned without having meaningfully limited my abilities.
Throughout the AI industry, intellectual property (i.e., an AI model) is generally not considered secure against the Chinese government, if they really wanted to steal it.
Anthropic recently announced having reached a new security milestone, after investing quite heavily into their security: They now believe they can stop relatively low-resource terrorist groups from successfully stealing their models, but not necessarily a handful of other non-state actors, nor determined state groups:
“The following groups are out of scope for the ASL-3 Security Standard because further testing (as discussed below) should confirm that the model would not meaningfully increase their ability to do harm: state-sponsored programs that specifically target us (e.g., through novel attack chains or insider compromise) and a small number (~10) of non-state actors with state-level resourcing or backing that are capable of developing novel attack chains that utilize 0-day attacks.”
I commend Anthropic for detailing the risks of theft so thoroughly, and I expect that they are among the most defended against such security risks.
Some AI organizations have found evidence of “alignment faking” in current AI systems: An AI reasons that it is better not to reveal one’s misalignment in some settings, as opposed to revealing it and getting re-trained as a consequence.
I’m not very confident that AIs can make successful deals like this today, but we gestured at some possible methods during the simulation, and neither the facilitator nor any other human player objected. Aside from whether these deals are currently possible, I have more conviction that there’s a lot of risk in technology that would facilitate deals between AIs. The risk is especially high if the AIs rely on trust-building methods with each other that they couldn’t also use to trust people (e.g., if AIs rely on reviewing the internals of each other’s code).
Elon Musk recently described Taiwan’s importance to the AI chip supply chain as, “currently 100% of advanced AI chips are made in Taiwan.”
There are so many distinct trends pushing toward continued AI progress—creation of higher-quality synthetic data; efficiency gains in using existing chips; optimizations that make a model more economical for new use-cases—that for AI progress to fully stall would seem like a remarkably strong coincidence (unless, of course, there were an external event like a major war).
For instance, maybe one disadvantage of the West pursuing a single AI mega-project is that there will be fewer powerful AI systems at different institutions to limit each other’s power. On the other hand, if there are in fact multiple different AIs from different organizations, each might perceive the other as a threat and be more inclined to take aggressive action.
I’m unsure of specifics, but OpenAI was recently awarded a US Department of Defense contract, with a purpose described by the DoD as: “develop prototype frontier AI capabilities to address critical national security challenges in both warfighting and enterprise domains”. OpenAI’s writeup of the deal describes the purpose as: “identify and prototype how frontier AI can transform its administrative operations, from improving how service members and their families get health care, to streamlining how they look at program and acquisition data, to supporting proactive cyber defense.”
One tension: AI systems might help automate some work such that governments can manage more crises, though with the risk that the AI might be untrustworthy (and would now be a crucial part of managing crises).