OpenAI has unveiled an enhanced version of its most advanced artificial intelligence model, known as o3, which is designed to take even more time to analyze questions. This announcement comes just one day after Google disclosed its initial model in this category. The new o3 model takes the place of o1, which OpenAI introduced in September. Similar to its predecessor, o3 dedicates additional time to thoughtful reasoning, enabling it to produce better answers to queries requiring intricate logical deductions. OpenAI opted not to use the name "o2" since it is already associated with a mobile network in the UK.
In a live broadcast on Friday, OpenAI CEO Sam Altman declared, “We see this as the start of a new phase in AI,” emphasizing that these models can tackle increasingly complex tasks that demand significant reasoning capabilities.
According to OpenAI, o3 surpasses o1 in various assessments, particularly those measuring complex coding abilities along with advanced math and science knowledge. It reportedly performs three times better than o1 on the ARC-AGI benchmark, which evaluates an AI's capacity to reason through challenging mathematical and logical problems it has never encountered before.
Google is involved in comparable research, as demonstrated by Noam Shazeer, a Google researcher, who revealed on X that the company has developed a reasoning model named Gemini 2.0 Flash Thinking. Google CEO Sundar Pichai referred to it as “our most thoughtful model yet.” This new model has achieved high scores on SWE-Bench, which assesses models' agentic abilities.
Despite Google’s advancements, OpenAI's o3 shows a 20 percent improvement over o1. Ofir Press, a post-doctoral researcher at Princeton University who contributed to the development of SWE-Bench, commented on the performance leap saying, “o3 blew it out of the water,” expressing surprise at the significant improvement.
The competition between OpenAI and Google is intensifying, with OpenAI needing to showcase continual progress to attract more investment and sustain a profitable business. In parallel, Google is eager to reaffirm its leading position in AI research.
The new models highlight a shift in focus within AI companies, moving beyond merely enhancing model scale to achieving greater intelligence from those models.
OpenAI mentioned two variants of the new model: o3 and o3-mini. The models are not publicly available yet, but OpenAI plans to invite external testers.
OpenAI also provided insights into the methodologies employed to align o1, introducing a technique called deliberative alignment. This training involves equipping the model with a set of safety guidelines while enabling it to analyze the nature of a request and its response to determine if it aligns with its safety protocols. This approach bolsters the model’s resilience against manipulation.
Although large language models excel at answering numerous queries, they often struggle with puzzles that require basic mathematical or logical skills. The o1 model includes training for step-by-step problem-solving, enhancing its capability in these challenging areas.
AI models with reasoning capabilities will be increasingly important as companies aim to create AI agents capable of independently solving complex problems for users. Mark Chen, OpenAI’s senior vice president of research, stated, “This signifies our significant progress in utility.” Altman noted, “This model excels in programming.”
While a true groundbreaking moment for the tech industry has not yet arrived, recent announcements in AI have been rapid and notable. Earlier this month, Google introduced a new version of its main model, Gemini 2.0, showcasing it as a web browsing aid and an assistant with vision capabilities via smartphone or smart glasses.
OpenAI has also made several announcements leading up to the holiday season, including a new video-generating model, a free version of its ChatGPT-powered search engine, and a system allowing users to access ChatGPT via phone by calling 1-800-ChatGPT.
This article was later updated with further remarks and details from OpenAI.
In a live broadcast on Friday, OpenAI CEO Sam Altman declared, “We see this as the start of a new phase in AI,” emphasizing that these models can tackle increasingly complex tasks that demand significant reasoning capabilities.
According to OpenAI, o3 surpasses o1 in various assessments, particularly those measuring complex coding abilities along with advanced math and science knowledge. It reportedly performs three times better than o1 on the ARC-AGI benchmark, which evaluates an AI's capacity to reason through challenging mathematical and logical problems it has never encountered before.
Google is involved in comparable research, as demonstrated by Noam Shazeer, a Google researcher, who revealed on X that the company has developed a reasoning model named Gemini 2.0 Flash Thinking. Google CEO Sundar Pichai referred to it as “our most thoughtful model yet.” This new model has achieved high scores on SWE-Bench, which assesses models' agentic abilities.
Despite Google’s advancements, OpenAI's o3 shows a 20 percent improvement over o1. Ofir Press, a post-doctoral researcher at Princeton University who contributed to the development of SWE-Bench, commented on the performance leap saying, “o3 blew it out of the water,” expressing surprise at the significant improvement.
The competition between OpenAI and Google is intensifying, with OpenAI needing to showcase continual progress to attract more investment and sustain a profitable business. In parallel, Google is eager to reaffirm its leading position in AI research.
The new models highlight a shift in focus within AI companies, moving beyond merely enhancing model scale to achieving greater intelligence from those models.
OpenAI mentioned two variants of the new model: o3 and o3-mini. The models are not publicly available yet, but OpenAI plans to invite external testers.
OpenAI also provided insights into the methodologies employed to align o1, introducing a technique called deliberative alignment. This training involves equipping the model with a set of safety guidelines while enabling it to analyze the nature of a request and its response to determine if it aligns with its safety protocols. This approach bolsters the model’s resilience against manipulation.
Although large language models excel at answering numerous queries, they often struggle with puzzles that require basic mathematical or logical skills. The o1 model includes training for step-by-step problem-solving, enhancing its capability in these challenging areas.
AI models with reasoning capabilities will be increasingly important as companies aim to create AI agents capable of independently solving complex problems for users. Mark Chen, OpenAI’s senior vice president of research, stated, “This signifies our significant progress in utility.” Altman noted, “This model excels in programming.”
While a true groundbreaking moment for the tech industry has not yet arrived, recent announcements in AI have been rapid and notable. Earlier this month, Google introduced a new version of its main model, Gemini 2.0, showcasing it as a web browsing aid and an assistant with vision capabilities via smartphone or smart glasses.
OpenAI has also made several announcements leading up to the holiday season, including a new video-generating model, a free version of its ChatGPT-powered search engine, and a system allowing users to access ChatGPT via phone by calling 1-800-ChatGPT.
This article was later updated with further remarks and details from OpenAI.