OpenAI Unveils GPT-5.4 with Native Computer Control and Advanced Reasoning

OpenAI has launched GPT-5.4, introducing native computer control and advanced autonomous reasoning capabilities. Discover how this groundbreaking model aims to redefine professional workflows and enterprise solutions.

OpenAI has launched its latest foundational model, GPT-5.4, which introduces significant advancements in autonomous reasoning, coding, multimodal understanding, and native computer-use capabilities [2, 3]. This release marks a new phase in the company’s development of frontier large language models, aiming to redefine professional workflows and enterprise solutions.

The launch of OpenAI’s GPT-5.4, featuring native computer control and enhanced autonomous reasoning capabilities, positions the company closer to realizing fully autonomous AI agents. These agents will be capable of executing complex, end-to-end tasks across various applications, poised to significantly transform enterprise workflows in areas like coding, data analytics, finance, and customer service. This strategic release arrives amidst heightened competitive pressure and scrutiny within the AI industry [3, 4].

Core Capabilities and Model Variants

OpenAI describes GPT-5.4 as its “most capable and efficient frontier model for professional work,” specifically designed to enhance performance across a spectrum of business applications [2, 3, 4]. The model is available in two primary variants: GPT-5.4 Thinking, which is optimized for complex reasoning tasks, and GPT-5.4 Pro, engineered for maximum performance on demanding workloads [2, 3].

GPT-5.4 Thinking is accessible to ChatGPT Plus, Team, and Pro subscribers, while GPT-5.4 Pro is rolling out for ChatGPT Enterprise and Edu subscribers, as well as through the API and Codex [3, 4]. This new family of models replaces the previous GPT-5.2 Thinking model, which is slated for deprecation within three months [3]. The update builds upon recent improvements to the Instant model, which focused on more natural conversational interactions [3].

Native Computer Interaction

A key new capability introduced in GPT-5.4 is its native computer-use functionality, marking it as the first general-use model from OpenAI to offer such direct interaction [3, 4]. This allows the model to autonomously operate across different software applications on a machine, effectively acting on behalf of the user [4]. The system can issue keyboard and mouse commands, interact with screenshots, and operate various software tools through libraries like Playwright [3, 4].

The model demonstrates proficiency in agentic workflows, enabling AI to interact with software, browsers, and other tools autonomously [3]. It supports long-running, multi-step agent trajectories and can verify its own actions, iterating through build-run-verify-fix loops [3]. Notably, this capability is underscored by GPT-5.4 achieving a record 75% score on the OSWorld-Verified benchmark, exceeding human testers who scored 72.4% [3].

Enhanced Reasoning and Context

For complex queries, GPT-5.4 Thinking introduces a visible plan-of-action outline, offering users transparency into the model’s autonomous reasoning process [3]. This feature allows users to intervene mid-response to adjust the direction of the AI’s work without needing to restart the entire interaction [3]. This functionality is currently live on web and Android platforms, with an iOS rollout anticipated soon [3].

The model also enhances deep web research, particularly for highly specific queries, by better maintaining context over extended periods of thought [3]. Developers can leverage the API version of GPT-5.4, which offers an expansive context window of up to one million tokens, the largest available from OpenAI to date [2, 3]. Furthermore, its image perception capabilities support inputs exceeding 10 million pixels without compression, thereby improving detail retention in visual analysis [3].

Performance Benchmarks and Factual Accuracy

OpenAI asserts that GPT-5.4 delivers improved performance across a range of professional tasks, including working with spreadsheets, creating presentations, and handling documents, alongside stronger coding capabilities and support for more complex multi-step tasks [1].

Independent benchmarks reflect these improvements. GPT-5.4 achieved record scores in computer use benchmarks OSWorld-Verified and WebArena Verified [2, 3]. It also recorded an 83% score on OpenAI’s GDPval test, which assesses knowledge work tasks [2]. Brendan Foody, CEO of Mercor, stated that GPT-5.4 takes the lead on Mercor’s APEX-Agents benchmark, designed to evaluate professional skills in legal and financial domains [2]. Specific gains include a substantial increase in spreadsheet analysis benchmark scores, rising to 87.3% from GPT-5.2’s 68.4% [3].

Reducing Hallucinations and Errors

A key focus for GPT-5.4 has been the reduction of hallucinations and errors, an ongoing challenge in large language models [1, 2]. OpenAI claims GPT-5.4 represents its “most factual model yet,” addressing a critical challenge in large language models [1, 2]. According to internal benchmarks from OpenAI, individual claims generated by GPT-5.4 are 33% less likely to be false compared to its predecessor, GPT-5.2 [1, 2, 3, 4]. Furthermore, full responses from GPT-5.4 demonstrate an 18% reduced likelihood of containing any errors compared to GPT-5.2 [1, 2, 4].

Safety Measures and Competitive Landscape

OpenAI has prioritized safety in the development and release of GPT-5.4, implementing strengthened safeguards and maintaining a high cyber-risk classification [1]. Additional protections have been implemented, including expanded cyber safety systems, enhanced monitoring tools, trusted access controls, and request blocking for higher-risk activities on Zero Data Retention surfaces [1].

In parallel with the GPT-5.4 launch, OpenAI published new safety research focused on monitoring how models reason [1]. This research included an open-source evaluation designed to test whether AI systems could conceal their internal reasoning processes [1]. The findings indicated that GPT-5.4 Thinking exhibited a low ability to obscure its reasoning, which the company characterized as a positive safety signal [1].

Market Dynamics and Strategic Context

The release of GPT-5.4 occurs during a period of intense competition in the AI market. Reports suggest that users have been migrating from ChatGPT to rival chatbots, particularly Anthropic’s Claude [1, 4]. This competitive pressure was previously highlighted by the “code red” response that led to the release of GPT-5.2 in December, following advancements from Google Gemini and Anthropic’s Claude [3].

OpenAI’s launch also follows a period of public scrutiny, including a “much-maligned decision” to engage in business with the Department of Defense [4]. This decision reportedly led to a loss of approximately 1.5 million users for ChatGPT and internal opposition from some employees [4]. The company appears to be leveraging the GPT-5.4 release as an opportunity to regain public confidence and reaffirm its leadership in AI innovation [4].

API Enhancements for Developers

For developers, GPT-5.4 introduces significant improvements in tool calling management through its API, enhancing efficiency and cost-effectiveness [2]. A new system named “Tool Search” has been launched, allowing models to dynamically look up tool definitions as required [2]. This innovation is expected to result in faster and more cost-effective requests, particularly in systems that integrate a large number of available tools [2]. The API’s expanded context window of up to one million tokens further supports developers in building more sophisticated and context-aware applications that leverage GPT-5.4’s enhanced autonomous reasoning [2].

Conclusion

OpenAI’s GPT-5.4 represents a substantial leap forward in large language model capabilities, particularly with its introduction of native computer control and enhanced autonomous reasoning. By improving factual accuracy, expanding context windows, and implementing robust safety measures, OpenAI aims to solidify its position in a rapidly evolving and competitive AI landscape. The model’s ability to execute multi-step tasks and interact with software autonomously signals a clear trajectory towards more capable and integrated AI agents, poised to redefine professional workflows and enterprise solutions.

Frequently Asked Questions

What is OpenAI’s GPT-5.4?

GPT-5.4 is OpenAI’s latest foundational large language model, introducing significant advancements in autonomous reasoning, native computer interaction, multimodal understanding, and coding capabilities. It is designed for professional work and aims to transform enterprise workflows.

What are the key new capabilities of GPT-5.4?

The primary new capabilities of GPT-5.4 include native computer-use functionality, allowing it to autonomously operate software and systems, and enhanced autonomous reasoning, which provides transparency into its thought process for complex queries. It also boasts improved factual accuracy and larger context windows.

Which variants of GPT-5.4 are available?

GPT-5.4 is available in two primary variants: GPT-5.4 Thinking, optimized for complex reasoning tasks, and GPT-5.4 Pro, designed for maximum performance on demanding workloads. These variants cater to different subscriber tiers and API access.

How does GPT-5.4 improve factual accuracy and reduce errors?

OpenAI states that GPT-5.4 is its “most factual model yet.” Internal benchmarks indicate that individual claims generated by GPT-5.4 are 33% less likely to be false, and full responses are 18% less likely to contain any errors compared to its predecessor, GPT-5.2.

What safety measures has OpenAI implemented for GPT-5.4?

OpenAI has strengthened safeguards for GPT-5.4, including expanded cyber safety systems, enhanced monitoring tools, trusted access controls, and request blocking for higher-risk activities. Research also indicates a low ability for GPT-5.4 Thinking to obscure its reasoning, which is considered a positive safety signal.

Sources

Share
Renato C O
Renato C O

"Renato Oliveira is the founder of IverifyU, an website dedicated to helping users make informed decisions with honest reviews, and practical insights. Passionate about tech, Renato aims to provide valuable content that entertains, educates, and empowers readers to choose the best."

Articles: 190

Leave a Reply

Your email address will not be published. Required fields are marked *