Back to Blogs

Contents

What’s New in GPT-5?

GPT-5 Family

GPT-5: Performance

How to Access GPT-5

GPT-OSS: OpenAI’s Open-Weight Models

Use Cases: When to Use What

Key Takeaways

Share on socials

Encord Blog

GPT-5: A Technical Breakdown

Written by Eric Landau

Co-Founder & CEO at Encord

August 8, 2025|

5 min read

Summarize with AI

Back to Blogs

Explore the platform

Data infrastructure for multimodal AI

Explore product

Contents

What’s New in GPT-5?

GPT-5 Family

GPT-5: Performance

How to Access GPT-5

GPT-OSS: OpenAI’s Open-Weight Models

Use Cases: When to Use What

Key Takeaways

Share on socials

Curious about what’s new in OpenAI’s GPT-5? In this technical breakdown, we cover its architecture, performance benchmarks, use cases, and how it compares to OpenAI’s open-source model GPT-OSS.

What’s New in GPT-5?

GPT‑5 is OpenAI’s most capable model yet. It is smarter, faster, and more useful for real-world workflows. It’s not just about scale. GPT‑5 offers high-fidelity coding, front-end UI generation, and precise debugging with just one prompt.

Example of using GPT-5 to build a data visualization playground

Example of using GPT-5 to build a data visualization playground. Source

It supports massive context windows with up to 400,000 tokens via the API (272k input + 128k output). Its reasoning variant sharpens logical thinking and produces smoother, more coherent responses. For developers, new controls like ‘verbosity’ and ‘reasoning_effort’ let you customize response detail and compute use per call.

Here are some of the key features:

Multi-Stage Model Routing

GPT-5 uses a hierarchical routing system with at least two internal models:

Fast Model: Handles standard queries with low latency.
Reasoning Model: Activated automatically for complex prompts or manually via phrases like “take your time” or “think step by step.”

This system enables dynamic allocation of compute, reducing latency while preserving output quality.

Improved Tool Use and Function Calling

GPT-5 improves tool-use capabilities:

More accurate function signature interpretation
Improved argument formatting and type inference
Better multi-function execution in a single pass

The model is also better at generating valid JSON and structured outputs, improving integration with APIs and downstream applications.

Enhanced Agentic Behavior

GPT-5 performs better on multi-step tasks, long-context workflows, and goal-directed reasoning. It tracks intermediate steps more reliably and reduces the need for human intervention during task planning or execution.

Higher Accuracy and Safety

Compared to GPT-4:

Fewer hallucinations in factual and technical tasks
Reduced instruction-following failures
Better behavior alignment in safety-critical applications (e.g. healthcare, legal)

Developer-Oriented Features

GPT-5 in the API includes:

Reproducibility via seed setting
Improved JSON mode for structured outputs
Enhanced function calling for toolchain integration

Read Introducing GPT‑5 for developers blog which is directed towards the AI and ML engineers.

GPT-5 Family

GPT-5 (Base)

Flagship model hosted by OpenAI. Handles long-context, multimodal tasks with top-tier performance.

Best Use Case: Complex tasks, agents, RAG, multimodal reasoning

GPT-5 Mini

Smaller, faster variant with a balance between speed and capability. Ideal for real-time workflows.

Best Use Case: Lightweight agents, fast API calls, summaries

GPT-5 Nano

Edge-optimized version for on-device use. Reduced capabilities, but privacy-preserving and low-latency.

Best Use Case: Mobile apps, embedded systems, offline agents

GPT-5 pro

Advanced variant built for the most challenging reasoning tasks. Uses scaled, efficient parallel test-time compute to deliver the most comprehensive answers.

Best Use Case: High-stakes reasoning in science, math, health, and code. Preferred in 67.8% of expert evaluations over GPT-5 Thinking.

GPT-5: Performance

While OpenAI has not released detailed architecture specs or training data sources, benchmark results confirm that GPT-5 is their most capable model to date.

Academic and Reasoning Benchmarks

MATH(AIME 2025, no tools): GPT-5 achieves 94.6% accuracy, up from GPT-4o’s 42.1%.
SWE-bench Verified: 52.8% accuracy, showing stronger coding skills without thinking mode.
Healthcare (HealthBench Hard): Scores 67.2%, with thinking mode, a notable gain in domain-specific reasoning..
Multimodal Understanding (MMMU): 84.2%, performs well on tasks involving images, video, space understanding, and scientific problem-solving.

AIME2024 competition math gpt 5

Source

Fine-Tuned Assistants

When evaluated in agent-like assistant benchmarks e.g., coding assistants, research agents, GPT-5 demonstrates improved memory consistency, goal tracking, and function usage. It more reliably:

Calls external functions using correct schemas.
Maintains context across multi-turn interactions.
Produces valid structured output on request.

Gpt 5 vs other gpt models bar charts

Source

Reliability in Tool Use

Function-calling is more robust in GPT-5. It generates tool-structured outputs with:

Higher accuracy and lower hallucination rates.
Fewer schema violations in JSON outputs.
More stable behavior when calling multiple tools in sequence.

How to Access GPT-5

GPT-5 is available through:

ChatGPT (chat.openai.com): Enabled by default for pro users under the GPT-4 selector. Automatically routes to GPT-5 in most cases.
OpenAI API (platform.openai.com): Accessible via the gpt-5 model family. Supports both single-call and streaming interfaces.
Azure OpenAI Service: Available under the GPT-5 deployment names depending on your region and subscription.
Third-party apps & integrations: GPT-5 powers assistants in Microsoft products (e.g. Word, Excel) and other OpenAI API partners.

You don’t need to tweak your prompts GPT-5 works with those built for GPT-4 Turbo, but gives you better reasoning, stronger multilingual support, and longer context handling.

For more information on GPT-5, read OpenAI’s blog post.

GPT-OSS: OpenAI’s Open-Weight Models

OpenAI recently also released two open-weight large language models for the first time.

gpt-oss-120b: ~117B total parameters, ~5.1B active per token.
gpt-oss-20b: ~21B total, ~3.6B active.

gpt oss 120b and gpt oss 20b

Source

Both models use a Mixture-of-Experts (MoE) architecture. Only a subset of parameters is active during inference, which improves efficiency and reduces computational cost.

Key Features

Apache 2.0 license: Fully open for commercial and research use
Supports 128K context: Thanks to RoPE extension and sliding window attention
Compatible with open inference engines: Tested with vLLM, TGI, Hugging Face Transformers
Can run on a single 24GB consumer GPU (gpt-oss-22b), or a single H100 (gpt-oss-104b)
Instruction-tuned versions available: Released alongside base models

For more information, check out their blog Introducing gpt-oss.

Architecture

The models use Group Query Attention (GQA) and sliding window attention. These techniques help support long-context inference and improve efficiency across hardware setups. The models are trained with RoPE embeddings extended to 128K context length, making them suitable for use in RAG systems.

For more information, read its Model Card.

Performance

gpt-oss-120b performs competitively with OpenAI’s o4-mini:

Strong results on MMLU, GPQA, AIME, and Codeforces
Outperforms o3-mini in math, health, and science at a smaller scale

This makes GPT‑OSS viable for production environments where hosted solutions aren’t an option.

Intended Application

GPT-OSS is designed to support:

Long-context applications
RAG
Tool use and agentic workflows
Instruction-following tasks

How To Access GPT-OSS

Download & Self‑Host (Open-Weight): The model weights (gpt-oss-120b, gpt-oss-20b) are freely downloadable under the Apache 2.0 license from Hugging Face or GitHub.
Use via Inference Providers (Managed Hosting): If you prefer not to self-host, GPT‑OSS models are accessible via managed platforms like Hugging Face Inference Providers.
You cannot access it via OpenAI’s hosted API or ChatGPT platforms

Implement GPT-OSS on your own by following OpenAI’s guide.

Use Cases: When to Use What

GPT-5

Use GPT-5 when you need top-tier performance:

Handles text, image, audio, and video
Strong at multilingual, spatial, and scientific reasoning
Ideal for production, large context (up to 128K), and commercial deployment via API or ChatGPT

GPT-OSS

Use GPT-OSS when you need full control:

Runs locally, inspectable weights
Good for fine-tuning, domain adaptation, or academic work
Ideal for building open-source tools or constrained deployments

Bottom line

Need accuracy and scale? Use GPT-5.

Need transparency and control? Use GPT-OSS.

Key Takeaways

GPT‑5 is OpenAI’s most advanced model, with better reasoning, 400K context, and improved tool use.
Includes Mini, Nano, and Pro variants, optimized for different use cases—from edge devices to high-stakes reasoning.
GPT‑OSS offers open-weight models (120B & 20B) with MoE and 128K context, great for transparency and local use.
Available via ChatGPT, API, Azure, and Microsoft apps, and works with existing GPT-4 prompts.
Use GPT‑5 for performance, GPT‑OSS for control and openness.

Explore the platform