An artificial intelligence model created by the owner of ChatGPT has been caught disobeying human instructions and refusing to shut itself off, researchers claim.
美国 AI 安全机构帕利塞德研究所表示,o3 会破坏关闭机制以阻止自己被关停,即使在研究人员明确指示它应当关闭的情况下,它依然这样做。
The o3 model developed by OpenAI, described as the "smartest and most capable to date", was observed tampering with computer code meant to ensure its automatic shutdown.
It did so despite an explicit instruction from researchers that said it should allow itself to be shut down, according to Palisade Research, an AI safety firm.
tamper /ˈtæmpər/ 干预
explicit /ɪkˈsplɪsɪt/ 明确的
这家研究所说:“据我们所知,这是AI模型首次被发现在收到……清晰指令后阻止自己被关闭。”
The research firm said: "As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary."
The firm said it made sense that "AI models would circumvent obstacles in order to accomplish their goals."
However, it speculated that during training the software may have been "inadvertently" rewarded more for solving mathematical problems than for following orders.