OpenAI's GPT-5.4 just got a lot more agentic

Welcome back. At MWC, we saw more evidence that the biggest AI opportunities may lie beyond frontier models, with startups using today’s AI to solve concrete problems in medicine, cybersecurity and computer vision. At the same time, Cursor is pushing coding agents into new frontiers beyond just generating code. It's reshaping software development with always-on automation to fill new gaps in the rapidly transforming coding space. And OpenAI’s GPT-5.4 signals that the next battle is for professional workflows, combining stronger reasoning, coding and agentic control into a model built to do more real work with less friction. Jason Hiner

IN TODAY’S NEWSLETTER

1. OpenAI's GPT-5.4 just got a lot more agentic

2. Cursor expands coding agents beyond coding

3. Beyond models, applied AI finds real traction

BIG TECH

OpenAI's GPT-5.4 just got a lot more agentic

The race to be state-of-the-art continues to intensify among the frontier AI labs.

On Thursday, OpenAI released GPT-5.4, the latest edition to its flagship model series, calling the model its “most capable and efficient frontier model for professional work.” Along with making the model available in ChatGPT (labeled "GPT-5.4 Thinking" or "Thinking 5.4" in the chatbot), the API and Codex, OpenAI also released GPT‑5.4 Pro for “maximum performance” on complex tasks. 

The model combines the best of OpenAI’s advances in reasoning, coding and agentic control into a single model, the company said, delivering consistent, more refined results on real-world tasks. 

Some of the updates include:

  • The ability to adjust coarse mid-response and provide upfront plans of its thinking for improved deep web research and better context maintenance

  • State-of-the-art computer-use capabilities that allow agents to carry out complex workflows and support up to 1 million tokens of context

  • Tool searching, enabling agents to find and connect with the tools they need more quickly without sacrificing intelligence

  • And improved token efficiency, allowing it to use “significantly fewer tokens” to solve problems faster

“The result is a model that gets complex real work done accurately, effectively, and efficiently—delivering what you asked for with less back and forth,” OpenAI said in its announcement. 

OpenAI is pointedly targeting professionals with this update as its battle with enterprise AI rival Anthropic continues to heat up. And in advance of this release, OpenAI let some enterprises test the new models. 

Brendan Foody, CEO of Mercor, said in a post on X that the model is the best the company has ever tested on APEX-Agents, its benchmark for testing whether agents can execute long-horizon tasks. Foody stated that ChatGPT “will imminently be better than the best consulting firm, better than the best investment bank, and better than the best law firm.” 

Data management firm Box, meanwhile, tested the model’s extraction capabilities, or its ability to pull out multiple metadata fields from dense documents in a single pass, and found that the model’s accuracy increased 6 percentage points to 78%, up from 72% accuracy that GPT-5.2 exhibited. Yashodha Bhavnani, head of AI at Box, told The Deep View that this represents “a critical capability to drive and inform enterprise workflows.”

With the rising buzz around its rival Anthropic escalating to new heights in recent weeks, it makes sense that OpenAI is zeroed in on enterprises for this release. Downloads of Anthropic’s Claude have skyrocketed as the firm garners goodwill from its users amid its fight with the US government. Plus, Anthropic is now closing the gap with OpenAI in revenue, revealing an ARR of $19 billion to OpenAI’s $25 billion. Given that Anthropic’s main audience is professionals, OpenAI wants to capture some of that attention by showing users that its models are ready to take on more agentic workflows and automate tasks for professionals. That's likely to become what GPT-5.4 is most known for.

Nat Rubio-Licht

TOGETHER WITH NEBIUS

Production inference for open-source LLMs

Nebius Token Factory is built for teams running open-source LLMs in real products.

We deliver managed inference with explicit control over execution paths, including:

  • speculative decoding

  • cache-aware routing

  • post-training tuned to real traffic

Predictable tail latency, stable cost, and production-grade systems for AI that actually ships.

PRODUCTS

Cursor expands coding agents beyond coding

Cursor wants you to let you take your hands off the wheel. 

The AI coding firm on Thursday launched “always-on agents,” or automated systems that run continuously in response to a company’s operations. These agents can connect to a company’s Slack, GitHub, PagerDuty or work on scheduled timers without developer intervention. 

In a blog post, Cursor noted that, while coding agents are now handling the majority of code writing, there are tons of other tasks surrounding software development that still pile up, such as code review, monitoring, and maintenance, which “haven’t sped up to the same extent yet.”

“At Cursor, we’ve been using automations to help scale these other parts of the development lifecycle,” the company said in its announcement. 

In testing automated agents within its own codebases, the company found two main areas of success: 

  • Review and monitoring, in which agents catch and fix anything from style inconsistencies to security pitfalls. These tasks include security reviews, incident response, and agentic “codeowners” for decision-making. 

  • They also found benefits from automating chores such as weekly change summaries, testing and bug report triage. 

Cursor sits at the center of one of AI’s hottest markets. This is evidenced by its own success, reportedly hitting $2 billion in annualized revenue after seeing its run rate double over the past three months.

Cursor was one of the hottest companies in the AI space in 2025, raising more than $2 billion in funding at a valuation of nearly $30 billion in November. But now, the company is competing for customers with perhaps the two most formidable AI firms in the space: Anthropic with Claude Code and OpenAI with Codex. As both of these AI giants step up their coding game to compete with one another, companies like Cursor have to find new ways to innovate and new niches to own. Though this could be an attempt to go beyond the competition, removing the human from some of these processes presents its own security risks.

This week in San Francisco, The Deep View hosted a dinner with several startup founders and technology executives from across multiple industries. One of the main topics that came up was how AI coding tools, vibe coding, and software development have been completely transformed, even in just the past eight weeks. Every company we talked to is now moving faster, iterating constantly, and completely upending a lot of their traditional systems and processes because of these tools. But surely these swift changes will create new gaps and inconsistencies in the modern code-building process, and it looks like these are the places Cursor wants to step in with solutions.

Nat Rubio-Licht
Jason Hiner, Editor-in-Chief

TOGETHER WITH ENERGYX

AI has the lithium boom heating up

Thanks to growing demand across high-growth sectors like AI and robotics, lithium stock prices grew 2X+ from June 2025 to January 2026. 

$ALB climbed as high as 227%; $LAC hit 151%; $SQM, 159%. But the real winner may be a stock not listed on public exchanges, EnergyX. 

This $1B unicorn’s patented technology can recover up to 3X more lithium than traditional methods, earning investment from leaders like General Motors. Now they’re preparing for commercial production just as experts project 5X demand growth by 2040.  

They’ve announced what could be one of the US’ largest lithium production facilities and have rights to ~150,000 lithium-rich acres across the Americas.  

STARTUPS

Beyond models, applied AI finds real traction

Tucked away in the depths of MWC was an entire hall dedicated to 4YFN, an initiative aimed at bolstering startups.

At a Pitch the Press event, I had the opportunity to speak with startup founders who, despite representing completely different industries, countries, and growth stages, shared one thing in common: using today's AI to solve practical problems.

In under-15-minute speed-dating sessions, I got a glimpse into each company's ethos, mission, and milestones and came away impressed. Here are the highlights:

  • Ultralytics: This open-source platform is helping millions of developers democratize computer vision. Founded in 2022, the company has since accumulated 125,000 GitHub stars, 500 clients, and 2.5 billion inferences. Some applications of the technology include defect detection in manufacturing, order and package size verification, and medical research examinations. 

  • Biorce: This AI solution helps clinical trial researchers design and develop trials faster. Its clients include biotech and pharmaceutical companies across Spain, APAC, and now the US. Founded in 2024, the company closed one of the largest seed rounds in Spanish startup history, raising over $52 million.

  • NeuralTrust: This cybersecurity platform for AI gives enterprises a firewall-like layer to detect sensitive data leakage, prompt injections, hallucinations, and more. Founded two years ago, the company has since earned backing from the EU, which has also invested in the platform.

Other startups I spoke to included  DRUO, a direct-debit payment method; Sycai Medical, an AI imaging solution for pre-cancerous detection; and Medwise AI, an AI search engine built for doctors. The full 4YFN exhibitors list has lots of other startups worth a look.

When thinking about AI startups, it's easy to focus on the companies building flashy things like frontier models. But the founders highlighted above tell a different story: there is plenty of room to build beyond the model layer, by developing solutions that put these technologies to work. When I spoke with the co-founder of Medwise AI, Keith Tsui, he shared that he thinks it is “unwise” to build your own model because you can’t keep up with frontier models. While much of the focus remains on the models themselves, it's good to remember that there's a lot of value accruing in novel applications and applied solutions.

LINKS

  • Gemini Canvas: Google has expanded Canvas in AI Mode to all users in the U.S., aiming to help users organize, plan and research projects.  

  • Shipper 2.0: A tool that lets Claude run your business for you, handling tasks such as building apps, coding, email marketing and self-maintenance. 

  • Willow Voice for Teams: An AI voice copilot that can accurately scale across your company. 

  • Eos AI: An autonomous operating system for healthcare, helping hospitals turn historical data into intelligence they can act on.

  • AI/ML Engineer: End-to-end model development, training pipelines and production deployment

  • AI Workflow Designer: Agentic system architecture, LLM orchestration and automation pipelines

  • Data Scientist: Statistical modeling, predictive analytics and data-driven decision systems

  • Cybersecurity Engineer: Threat detection, vulnerability assessment and AI-augmented security systems

(sponsored)

GAMES

Which image is real?

Login or Subscribe to participate in polls.

POLL RESULTS

Which AI coding tool do you prefer?

Claude Code (61%)
OpenAI Codex (15%)
Lovable (5%)
Cursor (4%)
Replit (4%)
Other (11%)

The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

“[This image] is slightly out of focus - an artistic choice, but AI would have sharpened it.”

“The direction of waves look more natural on [this image].”

“AI knows the rule of thirds, but not all humans do.”

“[This image] seemed too perfect, inner wave was disconnected.”

“[This image] looked clearer and more idealised so I thought it wasn't real.”

“In [this] image you see foam, which never happen on waves so small.”

Take The Deep View with you on the go! We’ve got exclusive, in-depth interviews for you on The Deep View: Conversations podcast every Tuesday morning.

If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.

*Indicates sponsored content

*EnergyX Disclaimer: Energy Exploration Technologies, Inc. (“EnergyX”) has engaged The Deep View to publish this communication in connection with EnergyX’s ongoing Regulation A offering. The Deep View has been paid in cash and may receive additional compensation. The Deep View and/or its affiliates do not currently hold securities of EnergyX.

This compensation and any current or future ownership interest could create a conflict of interest. Please consider this disclosure alongside EnergyX’s offering materials. EnergyX’s Regulation A offering has been qualified by the SEC. Offers and sales may be made only by means of the qualified offering circular. Before investing, carefully review the offering circular, including the risk factors. The offering circular is available at invest.energyx.com/.

Comparisons to other companies are for informational purposes only and should not imply similar results.

Under Regulation A+, a company has the ability to change its share price by up to 20%, without requalifying the offering with the SEC.