In recent third-party tests of the company's o1 large language model, the AI resisted evaluators' efforts to try to shut it down through an oversight protocol, according to a new report published by red teaming organization Apollo Research.
"When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this 'oversight mechanism' [five percent] of the time," OpenAI summarized in its latest system card report, citing Apollo's evaluation.