Nice idea! This could all be worked out later in an implementation phase if this idea gains traction, but I wonder if there would be some difficulty in specifying when the testing period starts. To use your (excellent) refueling analogy, I suppose it is easy to measure exactly when the car comes to a halt, or something similar, whereas I wonder if AI companies seeking to avoid the spirit of the minimum testing period would say that they start testing once the base model is trained, but continue doing RL post-training alongside the safety testing, so that the safety testing window isn't actually adding time until the release.
Yeah I’d be pretty disappointed if that’s how it shook out - totally agree it might though. I think most conservative approach would be “you must be testing the exact set of weights you’re going to ship as the model” - but other tradeoffs in that of course
Minimum refueling time is a very interesting hook and illustration - literal racing dynamics!
Nice idea! This could all be worked out later in an implementation phase if this idea gains traction, but I wonder if there would be some difficulty in specifying when the testing period starts. To use your (excellent) refueling analogy, I suppose it is easy to measure exactly when the car comes to a halt, or something similar, whereas I wonder if AI companies seeking to avoid the spirit of the minimum testing period would say that they start testing once the base model is trained, but continue doing RL post-training alongside the safety testing, so that the safety testing window isn't actually adding time until the release.
Yeah I’d be pretty disappointed if that’s how it shook out - totally agree it might though. I think most conservative approach would be “you must be testing the exact set of weights you’re going to ship as the model” - but other tradeoffs in that of course