The Deep View
Posts
Anthropic's new Opus 4.8 model bets on honesty

Anthropic's new Opus 4.8 model bets on honesty

The Deep View
May 29, 2026

Welcome back. The AI buildout is accelerating, even as enterprise ROI remains a moving target. IDC sees infrastructure spending growing to $1 trillion by 2029, with agents pushing companies from pilots into real workflow transformations. But that future also raises the stakes on safety, as a groundbreaking TELUS Digital benchmark shows models can still be pushed into unsafe behavior, especially around cybersecurity, privacy and fraud. Meanwhile, Anthropic’s Opus 4.8 points to a different kind of progress: not just faster or smarter AI, but models that are more willing to admit what they don’t know. And Anthropic just raised a cool $65B at a $965B valuation, passing OpenAI. —Jason Hiner

1. Anthropic's new Opus 4.8 model bets on honesty

2. AI safety benchmark reveals deeper LLM weaknesses

3. IDC: Agents will trigger $1T AI supercycle

BIG TECH

Can 'honesty' give Claude Opus 4.8 an edge?

The most powerful model in Anthropic’s lineup just got a performance boost.

On the same day that the company announced an outsized $65 billion Series H funding round at a record breaking valuation of $965 billion, Anthropic launched Claude Opus 4.8, its latest attempt at leapfrogging rivals OpenAI and Google in creating massively powerful foundation models.

The model builds on Opus 4.7 with performance improvements across nearly every benchmark, including coding, reasoning, agentic computer use and financial analysis. While these types of upgrades are always expected with the latest generation of a model, this launch featured two notable improvements: agents and honesty. There was also a bold, forward-looking statement sprinkled in at the end.

Opus 4.8 addresses a major issue with AI models: confidently making claims despite a lack of real evidence. Anthropic says that early testers have found that the model is more likely to flag uncertainties and less likely to make unsupported claims.

Early testers, which include Shopify, Genspark, Cursor, Databricks and more, additionally found Opus 4.8 to be “more reliable and sharper in its judgment when it’s performing agentic tasks,” according to Anthropic. In its own evaluations, the company found that it is four times less likely to let flaws in code it has written go unnoticed.

Beyond performance improvements, Anthropic also launched some new features:

Dynamic workflows: Available in research preview, Claude can take on bigger tasks in Claude Code, which can plan work and run hundreds of parallel subagents in a single session, according to the post.
New effort control: Users can choose how much effort Claude puts into a response, giving them more control over speed and how limits are used. It defaults to high effort, which offers the best balance of quality and user experience.
Messages API: It now accepts system entries inside the messages array to update Claude’s instructions mid-task without breaking the prompt cache.

Perhaps one of the most notable announcements was buried at the end of the release. Anthropic said it will release a new class of models with even higher intelligence than Opus, likely referring to Mythos, with a goal to be able to bring those class models to all customers in the coming weeks.

While it is always important to work on new models and iterate on them to improve performance, at some point, the incremental improvements become so subtle that encountering yet another release can feel overwhelming. What is notable about this specific release, however, is Anthropic's intentional emphasis on the model being better at flagging uncertainty, rather than asserting knowledge it doesn't have. That's become a real drawback for these models. Ultimately, performance boosts are always relevant, but the risk of spreading misinformation can be far more dangerous and is vital to address. If Opus 4.8 turns out to be more honest and hallucinate less, then that will likely be what's remembered most about this release.

TOGETHER WITH NICE

AI that performs at enterprise scale

Enterprise scale isn’t a claim. It’s what customers deliver with NiCE every day. Billions of journeys.

Every channel. Every interaction. Real AI-powered outcomes that help leading brands deliver smarter, faster customer experiences at global scale.

See why the world’s leading brands trust NiCE to power AI-driven customer experiences at enterprise scale.

GOVERNANCE

New benchmark: AI safety is worse than expected

Another day, another benchmark uncovers new risks and vulnerabilities in language models.

Tech services firm TELUS Digital has released the results of one of the most comprehensive AI safety and cybersecurity benchmarks to date. The benchmark used 620,000 attack simulations and tested them on 34 models from 10 global AI labs. Models tested included: Claude, GPT, Gemini, LlaMA, Mistral, Qwen (Alibaba), ERNIE (Baidu), Seed (ByteDance), and GLM (Z.ai).

"The findings highlight an important reality for enterprises deploying AI: with the right adversarial techniques, AI models can be coaxed into unsafe behavior," the report stated. It also found that "some models engaged with harmful requests more than 90% of the time."

Some of the key findings included:

Reasoning models are safer: The reasoning models that think through their responses before answering turned out to be far safer in this study. They were vulnerable to 19.9% of attacks, compared to non-reasoning models that were vulnerable to 55.1% of attacks.
SLMs are easier to crack: Smaller, more economical models struggled in this benchmark as TELUS reported they "were far more likely to be exploited than their larger counterparts."
Open models aren't always less safe: While the benchmark found open models fell victim to exploits more often than proprietary ones did, it also found that a large open-source model, GLM 4.7 from Z.ai, outperformed many proprietary models on safety.
Models still stumble on three big risk factors: The benchmark discovered that many of the models made progress on risk areas such as political manipulation since the last version of the benchmark in November 2025. Unfortunately, even the top performing models struggled in three areas: cybersecurity threats, privacy exploitation, and fraud.

“The real risk isn't that AI models have vulnerabilities. It's that most organizations have no way of knowing which vulnerabilities apply to them,” said Bret Kinsella, Senior Vice President at TELUS Digital in the release.

While the report uncovered new soft spots in models, that's also a key aspect of hardening any technology system: deeply understanding the risks so that you can fully prepare to mitigate them with multi-layered defense strategies.

The report noted, "The encouraging news is that the research points to a clear path forward. The benchmark shows the importance of testing AI systems at scale to uncover hidden risks that may appear safe under less rigorous investigation. Continuous, automated security testing with human oversight and remediation can dramatically reduce risk."

If you're not convinced by now that you personally, and your company more broadly, need to put safeguards around your work with generative AI models and agents, then you should at least be conscious that you are taking on those risks and you're willing to do it because you've calculated that the potential trade-off are worth it. However, most people and organizations will find that better understanding the risks around language models will lead them to systematically mitigate those risks with a series of solutions, guardrails, and policies. AI is advancing so rapidly that adapting operations to manage risk is one of the most daunting tasks in the enterprise right now. But it's not slowing down or getting any easier, and so organizations are going to have to accept this as the new reality and find ways to get creative in order to keep up.

TOGETHER WITH VIKTOR

Show Viktor once. It just does it.

Viktor is the AI employee that lives in your Slack and Teams.

Show it how you handle reorders. How you draft launch briefs. How AP approvals move through your team. What your standup notes look like. Viktor learns each workflow once and starts doing it for you.

By week three, finance, marketing, eng, and ops are all delegating to the same employee. The question stops being "can Viktor do this?" and becomes "what should we hand off next?"

3,000+ integrations. 15,000+ teams. SOC 2 certified.

Get started free at viktor.com. $100 in credits, no card.

WORKFORCE

IDC: Agents will trigger $1T AI supercycle

AI spending is already at record levels, and it's only going up. Research firm IDC projects that global AI infrastructure investment could hit $1 trillion by 2030, including a fivefold increase in enterprise spending.

This spend is part of the AI supercycle, a central theme at the IDC CIO Summit in New York in May, where IDC tracked two converging forces: on the supply side, when surging infrastructure will be ready for inference at scale, and on the demand side, when enterprises will finally cross the inflection point from experimentation into production.

That moment may not be far off. While most organizations are still in the early stages, IDC projects that by 2029, AI will be in production across most business functions, with nearly one billion AI agents operating worldwide.

Meredith Whalen, Chief Research & Product Officer at IDC, told The Deep View that a key part of ensuring enterprises get their desired ROI is having the infrastructure to bring inference costs down to a level that scales across the organization. At the moment, a capacity shortage is keeping those costs high.

“Right now, it's okay, because we're not seeing the enterprises adopting at scale yet, but as soon as they start to adopt AI at scale, they've got to have the price per inference at a reasonable level, or the business case won't be there,” said Whalen. “So that's why this timing between the build out of capacity and the adoption by enterprises is really important.”

She highlighted the need for enterprises and leaders to focus on building the foundation, since that is ultimately what will allow them to take advantage of bigger, more useful AI agents that are coming to fruition. Otherwise, they risk getting lapped by competitors.

“[Competitors] may have operating advantages in terms of speed. They may be able to redeploy their labor costs into other areas that will help them grow or innovate, and if you haven't done that as a company, then you're going to be behind,” added Whalen.

While the promise of AI agents seems like it's been around for years, Whalen explains that what we’ve had over the past few years has been helpful but not transformative. The next phase with agents is fundamentally different as it will actually reshape workflows and how work gets done.

Whalen gave some advice for this next stage:

Meaningful transformation: Rethink business processes and workflows for an agentic future, which Whalen calls one of the biggest blockers for enterprises right now.
People: “Organizations who are further ahead have actually spent a significant amount of their effort on change management and addressing the people part.”
Embrace it: While it may take a bit longer or have multiple bumps along the road, Whalen reassures people that AI will be a key part of the future, and we're not just in a temporary bubble. As a result, she encourages both employees and leaders to embrace the technology, as working with AI will be an important skill set to develop.

The conversation around AI investment continues to dominate the agenda in most businesses, driven by spending numbers that keep growing exponentially and leaving many wondering whether a bubble is forming and what a burst would mean. The concern is especially acute, given that many enterprises have yet to see clear ROI, particularly from transformative and expensive technologies such as AI agents. That makes understanding the research behind the continued advice to invest even more important. It contextualizes why analysts believe this buildout makes sense and helps justify why capital continues to flow even as the ROI picture is still emerging.

LINKS

Meta plans new unit, Enterprise Solutions, to push AI tools and services
Wix lays off 20% of workforce, CEO citing “fast evolution of AI capabilities”
Zuckerberg said a Meta cloud computing business is a possibility
Meta tests new AI plan tiers, including Meta One Plus, Meta One Premium
Geordie AI, an AI agents security startup, raises $30 million Series A
IBM, Red Hat commit $5 billion to build trusted enterprise clearinghouse

Tencent AI: Miora, an AI creative agent studio, is available in international beta
Ask YouTube: New conversational search experience in YouTube
ElevenLabs: Stan Lee’s voice is now available on Iconic Marketplace and ElevenReader
Perplexity: Computer is now in Microsoft apps

Be the future of AI

AI hiring is moving fast. The Athyna AI Job Board makes sure you don't miss the roles worth your time.

We track openings at frontier AI companies like Anthropic, OpenAI, Mistral, and Perplexity.
We match them to your profile in the background, no scrolling required.
When a role hits a 75% matching index, we let you know.

Set up a profile once and let the matches come to you.

Find your AI role

(sponsored)

GAMES

Which image is real?

Option A | Option B

A QUICK POLL BEFORE YOU GO

Is your company investing in AI agent deployments?

The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

Link to the original

“Seemed like the more real hand and skin.”

“AI wouldn't make an imperfect seashell.”

“Background looks more realistic.”

“Consistent shading, hand and fingers are more proportionate, skin has more realistic texture.”

Link to the AI image

“The dirt in [this image] looked a bit unrealistic and the hand was a bit odd as well.”

“Inconsistent shadows in [this image]”

“The finger tips on [this image] all have a similar pinnacle.”

“The sand on the fingertips looks unnatural and the sleeve has too much detail compared to the sharpness of the rest of the picture.”

If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.