Contents
What’s New in GPT-5?
GPT-5 Family
GPT-5: Performance
How to Access GPT-5
GPT-OSS: OpenAI’s Open-Weight Models
Use Cases: When to Use What
Key Takeaways
Encord Blog
GPT-5: A Technical Breakdown

Curious about what’s new in OpenAI’s GPT-5? In this technical breakdown, we cover its architecture, performance benchmarks, use cases, and how it compares to OpenAI’s open-source model GPT-OSS.
What’s New in GPT-5?
GPT‑5 is OpenAI’s most capable model yet. It is smarter, faster, and more useful for real-world workflows. It’s not just about scale. GPT‑5 offers high-fidelity coding, front-end UI generation, and precise debugging with just one prompt.
Example of using GPT-5 to build a data visualization playground. Source
It supports massive context windows with up to 400,000 tokens via the API (272k input + 128k output). Its reasoning variant sharpens logical thinking and produces smoother, more coherent responses. For developers, new controls like ‘verbosity’ and ‘reasoning_effort’ let you customize response detail and compute use per call.
Here are some of the key features:
Multi-Stage Model Routing
GPT-5 uses a hierarchical routing system with at least two internal models:
- Fast Model: Handles standard queries with low latency.
- Reasoning Model: Activated automatically for complex prompts or manually via phrases like “take your time” or “think step by step.”
This system enables dynamic allocation of compute, reducing latency while preserving output quality.
Improved Tool Use and Function Calling
GPT-5 improves tool-use capabilities:
- More accurate function signature interpretation
- Improved argument formatting and type inference
- Better multi-function execution in a single pass
The model is also better at generating valid JSON and structured outputs, improving integration with APIs and downstream applications.
Enhanced Agentic Behavior
GPT-5 performs better on multi-step tasks, long-context workflows, and goal-directed reasoning. It tracks intermediate steps more reliably and reduces the need for human intervention during task planning or execution.
Higher Accuracy and Safety
Compared to GPT-4:
- Fewer hallucinations in factual and technical tasks
- Reduced instruction-following failures
- Better behavior alignment in safety-critical applications (e.g. healthcare, legal)
Developer-Oriented Features
GPT-5 in the API includes:
- Reproducibility via seed setting
- Improved JSON mode for structured outputs
- Enhanced function calling for toolchain integration
GPT-5 Family
GPT-5 (Base)
Flagship model hosted by OpenAI. Handles long-context, multimodal tasks with top-tier performance.
Best Use Case: Complex tasks, agents, RAG, multimodal reasoning
GPT-5 Mini
Smaller, faster variant with a balance between speed and capability. Ideal for real-time workflows.
Best Use Case: Lightweight agents, fast API calls, summaries
GPT-5 Nano
Edge-optimized version for on-device use. Reduced capabilities, but privacy-preserving and low-latency.
Best Use Case: Mobile apps, embedded systems, offline agents
GPT-5 pro
Advanced variant built for the most challenging reasoning tasks. Uses scaled, efficient parallel test-time compute to deliver the most comprehensive answers.
Best Use Case: High-stakes reasoning in science, math, health, and code. Preferred in 67.8% of expert evaluations over GPT-5 Thinking.
GPT-5: Performance
While OpenAI has not released detailed architecture specs or training data sources, benchmark results confirm that GPT-5 is their most capable model to date.
Academic and Reasoning Benchmarks
- MATH(AIME 2025, no tools): GPT-5 achieves 94.6% accuracy, up from GPT-4o’s 42.1%.
- SWE-bench Verified: 52.8% accuracy, showing stronger coding skills without thinking mode.
- Healthcare (HealthBench Hard): Scores 67.2%, with thinking mode, a notable gain in domain-specific reasoning..
- Multimodal Understanding (MMMU): 84.2%, performs well on tasks involving images, video, space understanding, and scientific problem-solving.
Fine-Tuned Assistants
When evaluated in agent-like assistant benchmarks e.g., coding assistants, research agents, GPT-5 demonstrates improved memory consistency, goal tracking, and function usage. It more reliably:
- Calls external functions using correct schemas.
- Maintains context across multi-turn interactions.
- Produces valid structured output on request.
Reliability in Tool Use
Function-calling is more robust in GPT-5. It generates tool-structured outputs with:
- Higher accuracy and lower hallucination rates.
- Fewer schema violations in JSON outputs.
- More stable behavior when calling multiple tools in sequence.
How to Access GPT-5
GPT-5 is available through:
- ChatGPT (chat.openai.com): Enabled by default for pro users under the GPT-4 selector. Automatically routes to GPT-5 in most cases.
- OpenAI API (platform.openai.com): Accessible via the gpt-5 model family. Supports both single-call and streaming interfaces.
- Azure OpenAI Service: Available under the GPT-5 deployment names depending on your region and subscription.
- Third-party apps & integrations: GPT-5 powers assistants in Microsoft products (e.g. Word, Excel) and other OpenAI API partners.
You don’t need to tweak your prompts GPT-5 works with those built for GPT-4 Turbo, but gives you better reasoning, stronger multilingual support, and longer context handling.
GPT-OSS: OpenAI’s Open-Weight Models
OpenAI recently also released two open-weight large language models for the first time.
- gpt-oss-120b: ~117B total parameters, ~5.1B active per token.
- gpt-oss-20b: ~21B total, ~3.6B active.
Both models use a Mixture-of-Experts (MoE) architecture. Only a subset of parameters is active during inference, which improves efficiency and reduces computational cost.
Key Features
- Apache 2.0 license: Fully open for commercial and research use
- Supports 128K context: Thanks to RoPE extension and sliding window attention
- Compatible with open inference engines: Tested with vLLM, TGI, Hugging Face Transformers
- Can run on a single 24GB consumer GPU (gpt-oss-22b), or a single H100 (gpt-oss-104b)
- Instruction-tuned versions available: Released alongside base models
Architecture
The models use Group Query Attention (GQA) and sliding window attention. These techniques help support long-context inference and improve efficiency across hardware setups. The models are trained with RoPE embeddings extended to 128K context length, making them suitable for use in RAG systems.
Performance
gpt-oss-120b performs competitively with OpenAI’s o4-mini:
- Strong results on MMLU, GPQA, AIME, and Codeforces
- Outperforms o3-mini in math, health, and science at a smaller scale
This makes GPT‑OSS viable for production environments where hosted solutions aren’t an option.
Intended Application
GPT-OSS is designed to support:
- Long-context applications
- RAG
- Tool use and agentic workflows
- Instruction-following tasks
How To Access GPT-OSS
- Download & Self‑Host (Open-Weight): The model weights (gpt-oss-120b, gpt-oss-20b) are freely downloadable under the Apache 2.0 license from Hugging Face or GitHub.
- Use via Inference Providers (Managed Hosting): If you prefer not to self-host, GPT‑OSS models are accessible via managed platforms like Hugging Face Inference Providers.
- You cannot access it via OpenAI’s hosted API or ChatGPT platforms
Use Cases: When to Use What
GPT-5
Use GPT-5 when you need top-tier performance:
- Handles text, image, audio, and video
- Strong at multilingual, spatial, and scientific reasoning
- Ideal for production, large context (up to 128K), and commercial deployment via API or ChatGPT
GPT-OSS
Use GPT-OSS when you need full control:
- Runs locally, inspectable weights
- Good for fine-tuning, domain adaptation, or academic work
- Ideal for building open-source tools or constrained deployments
Bottom line
Need accuracy and scale? Use GPT-5.
Need transparency and control? Use GPT-OSS.
Key Takeaways
- GPT‑5 is OpenAI’s most advanced model, with better reasoning, 400K context, and improved tool use.
- Includes Mini, Nano, and Pro variants, optimized for different use cases—from edge devices to high-stakes reasoning.
- GPT‑OSS offers open-weight models (120B & 20B) with MoE and 128K context, great for transparency and local use.
- Available via ChatGPT, API, Azure, and Microsoft apps, and works with existing GPT-4 prompts.
- Use GPT‑5 for performance, GPT‑OSS for control and openness.
Explore our products