• The Deep View
  • Posts
  • DeepSeek V4 puts frontier labs on notice, again

DeepSeek V4 puts frontier labs on notice, again

Welcome back. Today, we’re looking at an enterprise reality check behind the AI boom. Researcher Dan Klein argues that enterprises do not need superintelligence as much as super-reliability, because the most dangerous hallucinations are the ones that go under the radar. Anthropic added memory to Claude Managed Agents, closing a key gap for companies that want safer agents that need less babysitting. And DeepSeek V4 shows the cost war is far from over, pairing frontier-level benchmarks with a less expensive open model. Bigger AI still matters. But cheaper, safer, more reliable AI still matters to businesses. Jason Hiner

IN TODAY’S NEWSLETTER

1. DeepSeek V4 resets expectations for frontier AI

2. Anthropic closes critical enterprise agent gap

3. Frontier AI is outrunning enterprise reality

PRODUCTS

DeepSeek V4 puts frontier labs on notice, again

DeepSeek was a viral sensation before OpenAI took over the conversation. Now, with a new release, it's looking to recapture that momentum.

On Wednesday, the Chinese lab DeepSeek rolled out previews of its new flagship models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. DeepSeek released benchmarks that showcase how these models perform competitively against both open-source models and the top proprietary models from the frontier labs. 

The new models offer performance upgrades from their predecessor, including: 

  • DeepSeek-V4-Pro:

    • 1.6 trillion total / 49 billion active parameters 

    • Enhanced agentic capabilities and rich world knowledge that lead all open models, trailing only Gemini 3.1 Pro 

    • Outperforms all current open models in reasoning tasks, including math, STEM, and coding, performing competitively with leading closed-source models including GPT-5.4-High and Claude-Opus-4.6-Max

  • DeepSeek-V4-Flash:

    • 284 billion total / 13 billion active parameters

    • Smaller parameter size for faster and cheaper performance

The biggest highlight is that compared to DeepSeek-V3, the version of the model that went viral on its release in December 2024 because of its training efficiency and low cost, this model employs a different training architecture. That includes a new hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). 

The result is better computational efficiency with reduced compute and memory costs despite remembering longer sequences, also known as expanding its context window. Specifically, it has a context length of one million tokens, addressing what the company describes as a key impediment in reasoning models, where computational constraints have traditionally made extended context lengths impractical. 

A model's context window is a key determinant of its overall capabilities, as it governs how much past information the model can retain and reference at any given time. This is especially critical in the age of agentic workflows, where the ability to draw on a large volume of user information as context is essential. Despite these advancements, it achieves “significantly lower inference FLOPs,” a measure for computational costs, than DeepSeek-V3.2.

Interested users can try it on the DeepSeek website via the Expert Mode and Instant Mode options. To use it, you do have to either create an account or use the third-party sign-on options. It is also available in the API.

There is no denying the AI market's nearly insatiable hunger for everything bigger, faster, and more efficient. This appetite is only amplified by agentic technologies, which translate these capabilities into much more powerful results. Keeping costs at the center of their model is what made DeepSeek stand out the first time around, and given where things stand today, it is even more impressive that they have held firm to that philosophy and are doubling down on it. The DeepSeek-V4-Pro model costs $3.48 per million tokens, compared to $25 for Claude Opus 4.6 and $15 for GPT-5.4. And it achieves comparable benchmark scores. This release could put price pressure on the American labs, particularly as anxiety over competition from China continues to mount. That tension has reached the highest levels of government, with the White House recently accusing China of theft.

Sabrina Ortiz, Senior Reporter

TOGETHER WITH GRANOLA

The Deep View team is obsessed with Granola.

We're not talking about the food (although a few of us have an unhealthy interest in that too).

We're talking about the AI notepad we've been using in 2026. It works across our teams, summarizes every meeting, and saves us around 10 hours a week per person.

ENTERPRISE

 Anthropic closes critical enterprise agent gap

AI agents are running wild in most businesses right now. And if they aren't running wild in your organization yet, just give it a minute. 

As a result, every day I get a ton of pitches from startups promising to be the solution for organizations to track, control, and manage AI agents within their company networks. 

Earlier this month, Anthropic launched its version of a get-out-of-jail-free card with Claude Managed Agents, which "pairs an agent harness tuned for performance with production infrastructure."

But there were two drawbacks. One is that because it's in the cloud, you have to pay Claude's high token costs. That hasn't changed. You're still basically trading those token costs for the time and expense of building your own agent safety infrastructure. 

The other drawback was that it didn't retain memory between sessions. That's no longer the case as Anthropic has launched built-in memory for Claude Managed Agents in beta. And there's a lot to like.

  • You control the memories: The memories are now stored in files that you can export and share, and you can manage them through the API, which makes them very enterprise-friendly 

  • Agents can share with each other: If you allow it, agents share the context they are learning with each other to make them more effective over time

  • Teams have already been testing it: Netflix has been allowing agents to carry context across sessions and it's allowed teams to stop manually updating so many prompts; Ando (which is building a Slack competitor) is using the feature to retain organizational context for customers; and Rakuten is using the feature to reduce first-pass errors by 97% for its task-based long-running agents

If you're going to bring agents into your organization, you'd better have a solid plan for safety and security, because agents are non-deterministic systems that will do whatever it takes to accomplish a goal a human gives them. And that can include bypassing all kinds of policies, safeguards, best practices, and common sense. But once you have all the proper safeguards in place, you'll also want to let the agents cook and not have to re-explain the same context over and over again. That's what Anthropic is solving for with this new memory feature for Claude Managed Agents. Expect others to follow.

Jason Hiner, Editor-in-Chief

TOGETHER WITH BACKSTORY

The Answers Platform that Helps you Understand Why Deals are Stalling.

Sales teams increasingly are struggling to get true visibility into their forecasts and pipelines even in an era where the AI landscape is moving fast. Backstory helps CROs and sales teams drive revenue by figuring out what is really happening with a deal and what they should do next. 

Backstory was built on how your team actually sells, not just serving up another dashboard or chart to interpret. Get clear answers that focus on what drives revenue based on how your business actually sells. Trusted by NVIDIA, OpenAI, Zscaler, and Red Hat.

LABS

Frontier AI is outrunning enterprise reality

Today’s AI models are smart. But they're far from infallible. 

Hallucination has been an issue plaguing AI researchers since the tech’s inception. It’s something even the most powerful and intelligent AI models on the market fall victim to, with the problem worsening as tasks become more complex. The answer to this might not be bigger, more intelligent models, but rather more reliable ones, Dan Klein, CEO and founder of AI neolab Scaled Cognition and computer science professor at UC Berkeley, told The Deep View.

“I think a prerequisite for AI to be its best self is having systems that are reliable and trustworthy and controllable,” Klein said. “A system that's just throwing tokens at you without even itself knowing what is true and what is not, it feels like a recipe for systems that you can't ship because they just aren't reliable.” 

It’s natural that the industry believes scale will close the accuracy gap, said Klein, because it was true for a long time. Instances of hallucination vastly decreased as models became larger and smarter. 

  • But because frontier language models have used up most of the available data at their disposal, that gap isn’t going to get much smaller. “We've kind of consumed the most important bits.” 

  • Additionally, language models are not “truth machines,” Klein noted. They’ve simply been optimized to create outputs that are “indistinguishable from the truth.”

For enterprise, the dangerous hallucinations aren’t the ones that are obvious, like telling users to put rocks on their pizza, said Klein. Rather, it’s the ones that can go easily unnoticed that present the highest risks. For instance, if a customer asks a chatbot for their bank balance and pulls up an incorrect number, if it’s off by one digit but resembles an accurate amount, that could lead to a customer making bad transactions. 

The problem comes down to how these models are architected, Klein said. And it’s something that Scaled Cognition is dedicated to solving. Rather than doing “next token prediction” and retrofitting for reliability, as conventional LLMs are built to do, Scaled Cognition’s models treat the information prompted for as “first-order objects” that are prioritized, giving the enterprise more control over what the model will and won’t do. 

And though Scaled Cognition’s model, APT-1, is smaller than the massive models that frontier labs continue to build, its size makes it far more efficient, and “its correctness properties are in the structure of the model itself,” Klein said. 

“The problem is not that the models aren't intelligent enough. It's reliability,” said Klein. “I feel like that's just not a consensus by any means. Superintelligence is going to have its place. But I think for the vast majority of things we want to do with AI, super-reliability is more important than superintelligence.”

The crisis of scale versus accuracy may only be amplified by the desire for speed. The AI industry is obsessed with doing things quickly: Faster outputs mean faster results, and faster results (in theory) mean faster returns. In the eyes of shareholders and major tech firms, bigger and faster will always be better, whether or not that kind of intelligence is necessary for all of the tasks most people will want AI to handle. Meanwhile, enterprises are under pressure to embed AI more deeply into their processes, offloading an increasing number of tasks to agents while reducing human oversight. To Klein’s point, stealthy hallucinations that mimic the truth may slip through the cracks, and risk widening the disconnect between frontier labs' priorities and those of enterprise customers.

Nat Rubio-Licht

LINKS

  • QA Tech: Delegate testing to AI. Validate releases with dynamic regression and exploratory tests run by agents that act like your customers. Try QA.tech now. (sponsored)

  • Codex: Browser use inside Codex allows Codex to control the in-app browser 

  • Notion: In Plan Mode, agents can do work in read-only sandbox before acting

  • Uber Eats: It is in Claude

  • CodeRabbit: Has been testing GPT-5.5 in early access

  • Adobe: 2026 AI/ML Intern - Machine Learning Engineer

  • Capital One: Applied Researcher II

  • SAP: Senior Customer Success Manager

  • ServiceNow: AI Solution Architect

GAMES

Which image is real?

Login or Subscribe to participate in polls.

POLL RESULTS

Do you use AI to help prioritize your email and chat messages?

Yes (12%)
No (80%)
Other (8%)

The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

“Some trash on the side of the walkway convinced me that this was real.”

“The sky looks more natural and the walkway appears more in tune.”

“I didn't think anyone would prompt to have them walking in a bike lane.”

“The way you could see the Ferris wheel through the trees gave it away. Seemed too perfect. ”

“The trees in [this image] looked like oil paintings.”

“In [this image] some of the clouds were too neatly aligned.”

If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.