Back to Blogs

OpenAI o1: A New Era of AI Reasoning

September 13, 2024
5 mins
blog image

While September is typically associated with Apple’s releases, OpenAI made waves of its own!

OpenAI o1 series is a new class of large language models (LLMs) designed to enhance reasoning capabilities through a "thinking before responding" approach. With its advanced reasoning capabilities, o1 is setting a new standard for how artificial intelligence can solve complex reasoning tasks in math, coding, science, and beyond. 

This explainer will walk you through what OpenAI o1 is, how it functions, and why it’s such an exciting breakthrough for anyone tackling difficult, technical challenges.

What is OpenAI o1?

OpenAI o1 is a new generation of AI models focused on advanced reasoning. Unlike many AI models that provide immediate answers based on broad knowledge, o1 distinguishes itself by taking time to think through complex tasks. It employs a "chain of thought" approach, which allows it to carefully analyze and break down complex problems, solving them step by step for more accurate and insightful results.

undefined

This approach enables o1 to surpass previous models—like GPT-4o—on tasks demanding deep understanding and logical problem-solving. By scoring in the 89th percentile in coding competitions and ranking among the top 500 U.S. high school students in a prestigious math exam, o1 demonstrates its prowess as a powerful AI tool for tackling complex STEM challenges.

light-callout-cta Learn how to use GPT-4o in you data annotation pipeline. Read the blog How to Pre-Label Your Data with GPT-4o

The o1 Model Series: Preview and Mini Versions

OpenAI o1 is available in two versions: o1-preview and o1-mini, each tailored for different use cases.

  • OpenAI o1-preview is the advanced version, excelling in reasoning-heavy tasks such as coding, math, and science. It surpasses human experts on academic benchmarks in physics, chemistry, and biology.
  • OpenAI o1-mini is a faster, more cost-effective option for developers needing reasoning power without extensive general knowledge. While specializing in STEM fields, it remains highly capable in competitive programming and advanced math.

Both new models are accessible via ChatGPT and the API. O1-mini offers an 80% cost reduction compared to o1-preview, making it an attractive choice for users prioritizing speed and affordability.

How o1 Model Work: Learning to Reason

A standout feature of the o1 series is its ability to reason through problems using a chain of thought. This approach mimics human problem-solving: breaking down complex questions, recognizing mistakes, and refining approaches over time.

Reinforcement Learning

OpenAI o1 uses reinforcement learning (RL) to develop its reasoning skills. Through this process, the AI model is trained to improve its problem-solving strategies. For example, if o1 makes an error in a complex math equation, it learns to correct that mistake by trying new approaches and refining its solution process. The more time it spends thinking, the more accurate it becomes.

This RL approach leads to higher performance on reasoning-heavy tasks, which require more than just factual recall. It enables the model to improve with test-time compute—the more thinking time it gets, the better it performs.

Performance Highlights and Benchmarks

OpenAI o1 performance and benchmarks

Learning to reason with LLMs

OpenAI o1 has been tested on a variety of benchmarks, and its performance is nothing short of impressive:

  • Math Performance: In the prestigious AIME (American Invitational Mathematics Examination), a qualifying round for the USA Math Olympiad, o1-preview scored 83%, solving some of the toughest problems high school students face. By comparison, GPT-4o only managed 12%.
  • Coding: On Codeforces, a competitive coding platform, o1-mini achieved an Elo rating of 1650, placing it among the top 14% of all competitors, while o1-preview reached the 89th percentile. This makes it an invaluable tool for developers working on complex algorithms.
  • STEM Reasoning: In academic benchmarks like GPQA (science) and MATH-500, o1-mini outperformed GPT-4o, showcasing its superior reasoning abilities

These benchmarks demonstrate that o1 is not just faster or more powerful—it’s also much more intelligent when it comes to tackling problems that require deep thought compared to the previous GPT family, the GPT-4o.

Model Speed and Cost Efficiency

While the o1-preview model is an advanced, highly capable reasoning tool, OpenAI also offers a more efficient alternative: o1-mini. This smaller version of the model delivers comparable performance in many STEM tasks but at a fraction of the cost.

For example, o1-mini is up to 5x faster than o1-preview in some tasks, making it ideal for developers who need quick responses without sacrificing accuracy. Its reduced price—80% cheaper than o1-preview—makes it an attractive option for cost-conscious users.

Safety and Alignment in o1 Models

One of the critical advancements in the o1 series is its improved safety and alignment capabilities. Using the chain of thought, the reasoning model is trained to reason through safety guidelines, ensuring it adheres to ethical boundaries even in tricky or edge-case scenarios.

For example, o1-preview was tested on some of the hardest jailbreaking evaluations, where users try to bypass safety restrictions. It significantly outperformed previous models, scoring 84 out of 100 on safety adherence, compared to GPT-4o’s score of 22.

OpenAI o1 performance and benchmakrs

Learning to reason with LLMs

By teaching the model to integrate safety principles into its reasoning process, OpenAI has made o1 not only smarter but also safer and more aligned with human values.

light-callout-cta For more information, read the OpenAI o1 System Card.

Hiding the Chain of Thought

Though the o1 models internally use a chain of thought to arrive at answers, this process remains hidden from the user. OpenAI made the decision to summarize the reasoning process rather than showing the raw chain of thought, balancing user experience and competitive advantage.

This hidden chain of thought allows OpenAI to monitor the model’s internal reasoning, offering insights into how it arrives at decisions while protecting users from potentially confusing or overly complex intermediate steps.

Limitations and What’s Next

While the o1 models excel in STEM reasoning, they do have some limitations:

  • Limited Non-STEM Knowledge: o1-mini, in particular, is not as strong in general knowledge or language-focused tasks, such as answering trivia or biographies.
  • Room for Improvement: Future versions of the o1 series aim to address these gaps, expanding its capabilities to other modalities like browsing the web and uploading files or images.

OpenAI continues to iterate on the o1 series, with plans to release improved models and add more functionality in future updates.

OpenAI o1: How to Use?

  • ChatGPT Plus and Team Users: You can access both o1-preview and o1-mini in ChatGPT starting today. Use the model picker to select between the two. Initially, you will have a weekly message limit of 30 for o1-preview and 50 for o1-mini. 
  • ChatGPT Enterprise and Edu Users: You will be able to access both o1 models beginning next week. This access will come with higher rate limits and additional features.
  • API Tier 5 Developers: If you qualify for API usage tier 5, you can start using both o1-preview and o1-mini today with a rate limit of 20 RPM. Note that the API currently does not support function calling, streaming, or system messages. 
  • ChatGPT Free Users: OpenAI plans to extend access to o1-mini to all ChatGPT Free users in the future.
  • Cost and Performance: o1-mini is available now to API Tier 5 users at a cost 80% lower than o1-preview, offering a more affordable option with impressive performance.

Key Highlights: OpenAI o1

  • Advanced Reasoning: Uses a chain of thought to tackle complex STEM problems.
  • STEM Performance: Excels in math (83% on AIME), coding (89th percentile on Codeforces), and science (outperforms PhDs on GPQA).
  • Two Versions: Full-featured o1-preview and cost-effective o1-mini (80% cheaper).
  • Reinforcement Learning: Trained to improve problem-solving and correct mistakes.
  • Hidden Chain of Thought: Internally monitors reasoning for improved safety without exposing raw thought process.
sideBlogCtaBannerMobileBGencord logo

Power your AI models with the right data

Automate your data curation, annotation and label validation workflows.

Book a demo
Written by
author-avatar-url

Akruti Acharya

View more posts

Explore our products