Don't rely on a "race to the top"

Apr 30

To make frontier AI safe enough, we need to "lift up the floor" with minimum safety practices

2 Comments

Hi Steven, thanks for writing this. I have some comments, but first I want to say that it’s easy to criticise ideas in this space, and much harder to actually put them forward, so I appreciate you doing that. That said, I do have a few questions:

From the way you’ve written this, it sounds like the ‘outside view’ is that racing to the top is a legitimate theory of change that people are actively relying on. I’d be surprised if that’s true. Even a moderately sceptical reader could spot the flaws you mention. Are people really counting on labs racing to the top as a strategy? What’s your sense of the median view here?

In the ideas section, a few of the proposals didn’t seem to tackle the core problem of adoption. (I’m hoping to write about a supervision-based idea you didn’t mention, and I’d be interested to hear your take on it.) Take the minimum testing period, for example, what prevents labs from lobbying for a shorter period? Couldn’t they just argue that labs in the PRC might catch up, and use that to push for an exemption? If you’ve covered this in the linked post, feel free to just point me to it.

On licensing: who’s actually issuing these licences? From what I gathered, it’s the US government. If that’s right, my main concern is enforcement. Once a licence is revoked, what stops a lab from continuing development anyway? Do we expect the government to be technically competent and well-informed enough to even know it’s happening? And if we imagine scenarios where AI is doing AI R&D, what do licences actually constrain? What does the licence stop?

On liability (sorry not framed as question but thought I’d share my thoughts): yes, it’s politically difficult. But it seems to me the point of liability isn’t that courts will fix things after catastrophic harms happen. Rather, it’s another tool like (licensing or financial incentives) to slow things down beforehand. Whether it actually works in practice is a fair question. I think it’s worth being sceptical, for example: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3563452

On transparency: I agree it matters, but don’t we already have some of this? For instance, model reports and red-teaming from groups like Apollo already show that models can be prompted to scheme. So what do you see as the actual effect of more transparency? What changes, in your view, if we get it?

Expand full comment

Reply (1)

Steven Adler

May 1

Hey these are great questions, appreciate you asking them!

Let me try to go through them fairly off-the-cuff:

“Are people really counting on RTTT?” I’m not sure, but I really hope not. It isn’t obvious to me from media coverage of RTTT though. I’d love if anytime that a group touted RTTT as part of its strategy, it also emphasized all the various things that need to happen alongside it. But this doesn’t seem to happen, in my estimation. If the only thing that this post accomplished is common knowledge that ‘nobody in AI safety thinks RTTT is enough’, that would still be useful from my perspective (though hopefully it accomplished more than that)

Re: “core problem of adoption”, I’d argue that ‘lobbying for a shorter testing period’ is more an issue of sufficiency of the practices rather than adoption, but I hear what you’re saying. Geopolitics is definitely a tricky issue, and IIRC I call it my biggest uncertainty, in the section on tradeoffs https://stevenadler.substack.com/p/a-minimum-testing-period-for-frontier

Re: licensing, yup it’s probably a government issuing, & what stops a lab from proceeding is general rule-of-law things (i.e., knowing they can be sued/sanctioned/etc if they violate the rules). It’s a good question whether the government is positioned to know, but one piece of such a law could be mandated disclosure (like in the Biden admin how there was a requirement to disclose if training a model above a certain size).

Re: liability, very fair, I agree it can have ex-ante effects even if it can’t be fully rectified post-catastrophe - though if you anticipate that can’t be rectified ex-post, probably the ex-ante effect will be smaller

Re: transparency, ehhh, only voluntarily and I think that adherence is spottier than we would like. The idea of mandating transparency is to have clearer parameters on what needs to be disclosed, so that it’s less at companies’ discretion. There’s some discussion of this in the minimum testing period piece as well, for instance in footnote 12

Thanks again for the questions and the thorough engagement

Expand full comment

Clear-Eyed AI

Don't rely on a "race to the top"