GPT-5.4 Is Here: OpenAI’s Most Powerful Model With Native Computer Use, 1M Token Context, and Professional-Grade Performance
Latest AI Agents News March: OpenAI launched ChatGPT-5.4 on March 5, 2026 — and this one is meaningfully different from what came before. It’s not an incremental update. It’s the first general-purpose OpenAI model with native computer control built in, the first to support a 1 million token context window, and it benchmarks at a level that puts it ahead of the average office worker on real professional tasks in 83% of comparisons.
GPT-5.4 is already live in ChatGPT for Plus, Team, Pro, and Enterprise users. It’s available via the API today. And it replaces GPT-5.2 Thinking as the default reasoning model — which retires June 5, 2026.
Here’s exactly what ChatGPT-5.4 does, what makes it different, and what it means for you whether you’re a developer, a professional, or an everyday ChatGPT user.
What Is GPT-5.4 and Why Does It Matter?
GPT-5.4 is OpenAI’s new flagship frontier model — described by the company as “our most capable and efficient frontier model for professional work.” It unifies two model lines that were previously separate: the general-purpose GPT series and the coding-specialized Codex series (GPT-5.3-Codex). The name jump from 5.2 to 5.4 reflects this merger, not just an incremental improvement.
What that unification means in practice: GPT-5.4 brings advanced coding and agentic capabilities into the same model you use for writing, analysis, and research. You no longer switch between a general model and a coding agent. One model handles both — with better performance on each than the dedicated predecessors.
OpenAI releases GPT-5.4 in two variants:
- GPT-5.4 Thinking — the reasoning-focused version available to ChatGPT Plus, Team, and Pro users. Replaces GPT-5.2 Thinking as the default.
- GPT-5.4 Pro — a more capable version available to Pro and Enterprise plans for the highest-stakes professional tasks.
Both versions are available in Codex and via the API. Enterprise and Edu plans can enable early access through admin settings.
GPT-5.4 Benchmark Results: What the Numbers Show
OpenAI backs this release with specific benchmark numbers across professional, computer use, coding, and reasoning categories. Here’s the verified data:
- OSWorld-Verified (computer use): 75.0% — up from GPT-5.2’s 47.3%, and above the measured human baseline of 72.4%
- BrowseComp (web research): GPT-5.4 improved 17% absolute over GPT-5.2; GPT-5.4 Pro reaches 89.3% — a new state of the art
- GDPval (knowledge work across 44 occupations): 83% — GPT-5.4 matches or exceeds industry professionals in 83% of comparisons, versus 71% for GPT-5.2
- MMMU-Pro (multimodal reasoning, no tool use): 81.2% — ahead of Gemini 3.1 Pro’s 80.5%
- APEX-Agents (law and finance professional skills): Ranked first across all models tested
- Investment banking benchmark (internal): Performance jumped from 43.7% with GPT-5 to 88.0% with GPT-5.4 Thinking
- Spreadsheet modeling (junior analyst tasks): 87.5% mean score versus 68.4% for GPT-5.2
- Reasoning accuracy: 88% on Benchable.ai evaluation
- Coding accuracy: 88% — “the most accurate among models of its speed tier”
The accuracy improvements are equally notable. GPT-5.4 is 33% less likely to produce errors in individual claims compared to GPT-5.2, and overall responses are 18% less likely to contain factual mistakes. For anyone who has been burned by confident hallucinations in previous models, those numbers matter.
Native Computer Use: The Biggest Capability Shift
Ai Agents: GPT-5.4 is the first general-purpose OpenAI model to ship with native, production-ready computer use built directly in — not as a separate specialized model, but as a core capability of the same model you use for everything else.
In practice, this means GPT-5.4 can take direct control of a computer: clicking, typing, navigating software, filling forms, operating websites, and completing multi-step workflows across multiple applications — all driven by screenshots and keyboard/mouse commands. Developers can build agents that actually operate websites and apps, not just generate text about them.
The OSWorld-Verified benchmark number tells the story clearly. GPT-5.4 hits 75.0% success navigating a real desktop environment through screenshots alone. GPT-5.2 managed 47.3%. The measured human baseline sits at 72.4%. GPT-5.4 is already better than the average person at navigating a computer this way.
Developers have additional control over this capability: the model’s computer use behavior is steerable via developer messages, and developers can configure custom confirmation policies to adjust safety behavior for different levels of risk tolerance. That’s a meaningful addition — it means enterprises can deploy computer use agents with guardrails appropriate to their environment rather than accepting a one-size-fits-all safety profile.
On WebArena Verified, which tests agentic web browsing across real websites, GPT-5.4 also sets a new record alongside the OSWorld score. The web browsing capability improvements are reflected in BrowseComp as well — the benchmark measuring how well agents can find hard-to-locate information through persistent web research.
The 1 Million Token Context Window
Ai Agents News: GPT-5.4 introduces a 1 million token context window in preview via the API — specifically 922,000 input tokens and 128,000 output tokens. For reference, GPT-5.2 had roughly half this capacity.
What does a million tokens actually let you do? To put it concretely: an average novel is around 90,000 words, roughly 120,000 tokens. A million tokens means you can load the equivalent of eight full-length novels, or an entire large codebase, or hundreds of legal documents, or months of email threads into a single session and have the model reason across all of it at once.
For developers building agentic systems, this changes what’s architecturally possible. Tasks that previously required chunking — splitting documents into pieces because the model couldn’t hold them all at once — can now run in a single pass. For enterprises processing large contracts, financial filings, or research corpora, the practical implications are immediate.
The context window preview is available today via the API. ChatGPT users on Thinking and Pro have context windows unchanged from GPT-5.2 Thinking for now.
Tool Search: Smarter, Cheaper API Calls for Developers
Agents News Today: One of GPT-5.4’s most practically significant developer features is a new system called Tool Search — a structural fix to a real problem in how agentic AI handles large tool ecosystems.
Previously, every API request had to include definitions for every available tool upfront in the system prompt. As tool libraries grew, this became expensive: more tokens per request, slower responses, higher costs. For MCP servers with tens of thousands of tokens of tool definitions, the problem compounded quickly.
Tool Search changes this. Instead of receiving all tool definitions upfront, ChatGPT-5.4 receives a lightweight list of available tools plus a search capability. When it needs to use a specific tool, it looks up that tool’s full definition at that moment and appends it to the conversation. The result: dramatically fewer tokens consumed on tool-heavy requests, faster responses, lower costs — and the ability to work reliably with much larger tool ecosystems than before.
The efficiency gain is concrete. In OpenAI’s evaluation of 250 tasks from Scale’s MCP Atlas benchmark with 36 MCP servers enabled, the tool-search configuration reduced total token usage by 47% while maintaining the same accuracy as exposing all MCP functions directly. For developers running high-volume agentic workflows, that 47% reduction translates directly into lower operating costs.
GPT-5.4 also improves tool calling accuracy itself — achieving higher accuracy in fewer turns on Toolathlon, the benchmark that tests AI agents completing multi-step real-world tool tasks. Fewer tool yields, better parallelization, faster task completion.
ChatGPT-5.4 for Professional Work: Finance, Documents, and Presentations
Alongside the model launch, OpenAI announced a dedicated suite called OpenAI for Financial Services — a set of enterprise tools powered by GPT-5.4’s professional reasoning capabilities. This includes:
- ChatGPT for Excel and Google Sheets (beta) — ChatGPT embedded directly in spreadsheets to build, analyze, and update complex financial models using existing formulas and structures
- New ChatGPT app integrations unifying market data, company data, and internal data into a single workflow — with named partners including FactSet, MSCI, Third Bridge, and Moody’s
- Reusable Skills for recurring finance tasks: earnings previews, comparables analysis, DCF analysis, and investment memo drafting
The benchmark results for professional document work are some of the most striking in the entire release. On spreadsheet modeling tasks modeled after junior investment banking analyst work, ChatGPT-5.4 scores 87.5% versus GPT-5.2’s 68.4%. On presentation tasks, human raters preferred GPT-5.4’s output 68% of the time over GPT-5.2, citing stronger aesthetics, greater visual variety, and more effective image generation.
Mercor CEO Brendan Foody, whose company developed the APEX-Agents benchmark for professional skills in law and finance, described GPT-5.4’s output directly: the model excels at creating long-horizon deliverables such as slide decks, financial models, and legal analysis — delivering top performance while running faster and at lower cost than competitive frontier models.
GPT-5.4 Thinking Mode: How It Works
ChatGPT-5.4 Thinking is the reasoning-focused version that shows its work before delivering a final answer. When you ask a question, it generates an upfront plan of its reasoning process — and critically, you can adjust the direction of that plan while the model is still working through it.
This is more useful than it might sound. On complex tasks — financial analysis, research synthesis, legal review — being able to redirect the model’s reasoning approach partway through can significantly improve the final output quality without requiring you to start over with a new prompt.
There’s also a safety dimension. OpenAI’s internal evaluation finds that deception is less likely in GPT-5.4 Thinking than in the standard version, “suggesting that the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool.” The chain-of-thought visibility isn’t just useful for users — it makes the model’s behavior more auditable for safety oversight.
GPT-5.4 API Pricing
OpenAI has published the following API pricing for GPT-5.4:
- gpt-5.4: $2.50 per million input tokens / $15 per million output tokens
- gpt-5.4-pro: $30 per million input tokens / $180 per million output tokens
The standard chatgpt-5.4 pricing at $2.50/$15 represents strong value for the capability level — particularly given the token efficiency improvements. Because GPT-5.4 uses significantly fewer tokens than GPT-5.2 on comparable tasks (and the Tool Search feature reduces token usage by up to 47% on tool-heavy workflows), real per-task costs can be lower than the input/output rate comparison alone suggests.
The Pro tier at $30/$180 is positioned for the highest-stakes enterprise use cases where GPT-5.4 Pro’s additional capability justifies the premium — primarily in professional services, finance, and complex agentic deployments.
Availability: Who Can Use GPT-5.4 Today
- ChatGPT Plus, Team: GPT-5.4 Thinking available now, replaces GPT-5.2 Thinking as default
- ChatGPT Pro: Both GPT-5.4 Thinking and GPT-5.4 Pro available now
- ChatGPT Enterprise and Edu: Enable early access via admin settings
- API: Available today at gpt-5.4 and gpt-5.4-pro model identifiers
- Codex: Available now — GPT-5.4 is the new default for Codex workflows
- GPT-5.2 Thinking retirement: Remains available in Legacy Models section for three months; retires June 5, 2026
The 1 million token context window is available in API preview now. ChatGPT context windows remain unchanged from GPT-5.2 Thinking at this stage.
GPT-5.4 vs GPT-5.2: What Actually Changed
| Capability | GPT-5.2 | GPT-5.4 |
|---|---|---|
| OSWorld-Verified (computer use) | 47.3% | 75.0% |
| GDPval (knowledge work) | 71% | 83% |
| Context Window | ~500K tokens | 1M tokens (preview) |
| Claim-level error rate | Baseline | 33% fewer errors |
| Native computer use | ❌ | ✅ |
| Tool Search | ❌ | ✅ (47% token reduction) |
| BrowseComp Pro score | — | 89.3% |
| Codex capabilities unified | ❌ Separate model | ✅ Built in |
What This Means for Developers and Builders
For developers building agentic systems, GPT-5.4 changes the calculus on what’s practical to build. The combination of native computer use, Tool Search, the 1M token context window, and improved tool calling accuracy addresses the specific bottlenecks that have made production-grade agent deployment difficult — not the question of whether agents can perform tasks, but whether they can do so reliably and cost-efficiently at scale.
The MCP ecosystem benefits significantly. Tool Search was designed with large MCP server deployments in mind — specifically the scenario where an agent has access to dozens of MCP servers containing tens of thousands of tokens of tool definitions. Previously, loading all of that upfront made large-scale MCP deployment expensive and slow. The 47% token reduction on the Scale MCP Atlas benchmark demonstrates the practical efficiency gain.
The Codex unification also simplifies the development decision. Previously, developers chose between a general model and a specialized coding model depending on the task. GPT-5.4 performs at the level of GPT-5.3-Codex on coding tasks while handling the broader professional workflow — one model string to maintain, one set of API parameters to manage.
Final Thoughts: Is GPT-5.4 Worth Upgrading For?
For most ChatGPT paid users, the upgrade is automatic — ChatGPT-5.4 Thinking replaces GPT-5.2 Thinking as the default. You don’t need to do anything; you’re already on the better model.
For developers and enterprises evaluating whether to migrate workloads, the answer depends on use case. If computer use, high-volume tool calling, large context processing, or professional document creation are part of your workflow, GPT-5.4 delivers material improvements on all of them — and the token efficiency gains can partially offset the migration effort through reduced API costs.
For users on the fence about the Pro tier, the investment banking benchmark jump (43.7% to 88.0%) and the financial services suite are the clearest signals that GPT-5.4 Pro is positioned for high-stakes professional workflows where output quality has direct financial consequences.
GPT-5.4 is available right now — in ChatGPT, Codex, and the API. If you’re building with OpenAI models or using ChatGPT professionally, it’s the model to be using in March 2026.
Frequently Asked Questions
What is GPT-5.4 and when was it released?
ChatGPT-5.4 is OpenAI’s newest flagship frontier model, released on March 5, 2026. It is described as OpenAI’s most capable and efficient frontier model for professional work, unifying the general-purpose GPT line with the coding capabilities of GPT-5.3-Codex. It is available in ChatGPT, Codex, and via the API in two variants: GPT-5.4 Thinking and GPT-5.4 Pro.
What is new in GPT-5.4 compared to GPT-5.2?
GPT-5.4 introduces native computer use (75.0% on OSWorld-Verified vs GPT-5.2’s 47.3%), a 1 million token context window in API preview, Tool Search reducing token usage by 47% on tool-heavy workflows, 33% fewer claim-level errors, an 83% score on GDPval knowledge work benchmark, and unified Codex coding capabilities. It replaces GPT-5.2 Thinking as the default reasoning model in ChatGPT.
What is GPT-5.4 Tool Search?
Tool Search is a new API system where GPT-5.4 receives a lightweight list of available tools plus a search capability, looking up full tool definitions only when needed — rather than loading all definitions upfront. In testing on 250 tasks from Scale’s MCP Atlas benchmark with 36 MCP servers enabled, Tool Search reduced total token usage by 47% while maintaining accuracy, dramatically reducing costs for large agentic deployments.
What is GPT-5.4 API pricing?
ChatChatGPT-5.4 is priced at $2.50 per million input tokens and $15 per million output tokens. GPT-5.4 Pro is priced at $30 per million input tokens and $180 per million output tokens. Due to improved token efficiency, real per-task costs can be lower than a direct rate comparison with GPT-5.2 suggests.
Who can use GPT-5.4 in ChatGPT today?
GPT-5.4 Thinking is available now to ChatGPT Plus, Team, and Pro users. ChatGPT-5.4 Pro is available to Pro and Enterprise plans. Enterprise and Edu users can enable early access via admin settings. GPT-5.2 Thinking remains available in the Legacy Models section until June 5, 2026, when it retires.
What is GPT-5.4’s context window?
ChatGPT-5.4 supports a 1 million token context window (specifically 922,000 input tokens and 128,000 output tokens) in preview via the API — more than double GPT-5.2’s context capacity. This allows processing of entire large codebases, hundreds of documents, or months of data in a single session. ChatGPT context windows remain unchanged from GPT-5.2 Thinking at this stage.
What is OpenAI for Financial Services?
OpenAI for Financial Services is a suite of enterprise tools launched alongside GPT-5.4, including ChatGPT for Excel and Google Sheets (beta), new ChatGPT app integrations with FactSet, MSCI, Third Bridge, and Moody’s, and reusable Skills for recurring finance tasks including earnings previews, DCF analysis, and investment memo drafting. The investment banking benchmark improved from 43.7% with GPT-5 to 88.0% with ChatGPT-5.4 Thinking.

Aman Alria is the founder of ClawdBot2.in and an artificial intelligence writer covering the latest AI news, tools, and trends. He breaks down complex AI topics into clear, honest content — from model comparisons and agent updates to AI regulation and learning resources. If it’s happening in AI, Aman is writing about it.