OpenAI is taking a firm stance against users who attempt to explore the inner workings of its new AI models.
Since the recent launch of the “Strawberry” AI model family, which includes the o1-preview and o1-mini models, OpenAI has been issuing warnings and threatening bans to users trying to investigate the models’ thought processes.
The o1 model, unlike its predecessors, was designed to engage in a step-by-step reasoning process before generating answers. While users can view a filtered version of this reasoning in the ChatGPT interface, OpenAI deliberately hides the raw “chain of thought.” The company uses another AI model to generate a summarized interpretation for users.
Despite this, many tech enthusiasts, hackers, and researchers are attempting to access the hidden reasoning process using various techniques such as jailbreaking or prompt injection. Reports have surfaced suggesting some users might have found a way to reveal the raw thought chains, although nothing has been officially confirmed.
In response, OpenAI has been actively monitoring these activities and issuing warnings to those who violate its policies. Some users, including well-known figures in the AI community, have reported receiving emails from the company, cautioning them to stop any attempts to uncover the model’s reasoning mechanisms. For example, users who mentioned terms like “reasoning trace” during interactions with o1 were flagged and warned by OpenAI. These emails stress compliance with OpenAI’s Terms of Use, warning that repeated violations could lead to bans from the platform.
OpenAI’s reasoning for keeping the model’s raw thought process hidden is multifaceted. The company acknowledges that revealing these chains could expose valuable data to competitors or allow for user manipulation. This decision has drawn criticism from parts of the AI community, with some arguing that it compromises transparency and interpretability. Independent researchers like Simon Willison believe that OpenAI’s move to conceal the model’s inner workings stifles transparency and development within the AI field.
The Guardian, Ars Technica, and MIT Technology Review contributed to this report.