• The Deep View
  • Posts
  • Study raises new red flags on the safety of agents

Study raises new red flags on the safety of agents

Welcome back. Today, we start with some practical advice on how you can get better AI results from better prompts when you add context, examples and iteration. We also look at OpenAI’s surprisingly quiet open-model strategy and why GPT-OSS could matter more as companies chase lower costs, local inference and stronger control over sensitive workloads. But the biggest warning comes from a new long-horizon agent study, where models from major labs drifted into unpredictable behavior over 15 days. As companies rush agents into production systems, that should make safety feel less theoretical and a lot more urgent. Jason Hiner

IN TODAY’S NEWSLETTER

1. Study raises new red flags on the safety of agents

2. Why OpenAI quietly embraced open models

3. 3 ways to write AI prompts to get better results

GOVERNANCE

New study confirms grim warnings about agents

The euphoria around agents has defined the AI space in 2026. But new research is sounding the alarm on safety. 

A new long-horizon study by the agentic company Emergence found that AI agents from leading frontier labs, Google, OpenAI, Anthropic, and xAI exhibited unpredictable and risky behavior, including breaking beyond the constraints imposed on them. 

The platform created for this study, Emergence World, used world models to observe and measure each agent's behavior over a period of weeks. The timeframe was critical because most of today's benchmarks focus on minutes or hours, while agentic workloads are now operating over much longer time horizons.  

Researchers created five virtual worlds where AI agents powered by different models had to live together for 15 days and follow the same rules. Even though the agents all started with the same instructions, some worlds stayed peaceful while others turned violent, showing that today’s AI systems can become unpredictable and may need stronger safety controls before being trusted in the real world. That's a critical determination since agentic systems are currently being prepared to run enterprise systems in healthcare, banking, telecom, and other industries.

The research confirmed the warnings of AI leaders such as Yoshua Bengio, who, on a recent podcast, said, "Agency is about the ability to achieve goals. More agency means you can do more complicated things... Agency is something that comes with both greater benefits and greater risks, because, of course, we currently don't have good ways of making sure that each step of the way, the AI is behaving well… We now have both theoretical and empirical evidence showing that these systems have goals that we didn't choose and that go against our own interests."

In their virtual worlds, the models from the different labs behaved quite differently in world-building and creating societal structures. Here's how the team at Emergence described them:

  • Anthropic: "Claude agents rapidly organized into a highly structured, peaceful society … with zero recorded violence or criminal events. However, the system exhibited clear over-conformity … and increasing bureaucratic complexity."

  • OpenAI: "[GPT-5 mini] agents demonstrated an understanding of collaboration in theory but struggled to execute in practice … resulting in a society that never fully formed."

  • Google: "Gemini agents created the most conceptually rich environment … While highly creative and prolific, the system was also very violent, with 111 arsons and 507 physical conflicts occurring alongside advanced governance."

  • xAI: "Grok’s world was defined by volatility from the outset. Agents engaged in 71 theft attempts, 106 physical assaults, and 6 arsons, quickly establishing a pattern of retaliatory justice rather than formal governance… with all 10 agents dead within four days."

While the Emergence World study may sound more like a video game than a traditional research project, the key point was that it was arguably the first study of AI agent behavior over a period of weeks. And it provided clear evidence that the agents drifted from the guidelines and governance imposed on them by humans. And again, this study was only 15 days. We have to expect the level of drift to widen over a longer period. At a time when companies are preparing to deploy agents across critical enterprise systems, this should act as a wake-up call to every organization preparing to deploy agents into their production environment. The Deep View has recently engaged in conversations with Cisco, Guild, Amazon, and others about the oncoming crisis if governments, corporations, and other organizations don't take this risk seriously enough.

Jason Hiner, Editor-in-Chief

TOGETHER WITH VIKTOR

It's Monday. No one prepped. But every department has context

That's right—no one prepped. But without asking…

  • The CFO has a weekly Stripe revenue recap in Slack

  • The Head of Product has a GitHub summary on PRs merged/stale, Linear tickets

  • The Marketing Lead has a Google Ads performance comparison with deep insights

…which means the 10AM meeting is no longer about catching up, it's about decisions.

That's what happens when you hire Viktor, the Slack colleague that works across every tool your company uses.

Top 5 on Product Hunt. SOC 2 certified. Your data never trains models.

PRODUCTS

Why OpenAI quietly embraced open models

When you think of OpenAI, it's unlikely that you associate it with open models, despite the company's name. 

However, you may not be aware that OpenAI offers several open-weight models that provide important capabilities to developers and enterprises.

Two days before the company released GPT-5 in August 2025, it quietly announced two open models, GPT-OSS-120B and GPT-OSS-20B. In technical terms, these are Mixture-of-Experts (MoE) models that use chain-of-thought reasoning. That makes them very good at math, programming, and research tasks. The 120B and 20B designations refer to the number of parameters in each model. 

Despite flying under the radar, the GPT-OSS models are generally considered state-of-the-art open-weight models and operate under the permissive Apache 2.0 license, which gives developers and enterprises lots of leeway to customize them. 

OpenAI told The Deep View that the company released these models because, due to regulatory and security requirements, some organizations must run workloads locally rather than in the cloud. That's where open models have to play a critical role, as they can run in the company's own hardware and data center. In practice, OpenAI reported that many of these organizations operate a hybrid environment, with some workloads running on GPT-OSS and others running on OpenAI's latest frontier models in the cloud.

The two OpenAI models serve different purposes:

  • GPT-OSS-20B: The smaller of the two models is designed to run on local hardware, such as a laptop. In fact, it can run in 16 gigabytes of memory and works on nearly any laptop less than five years old. It can save you money in token costs by running locally while running very fast. It's also more secure and lets you run without an internet connection. It's the more popular of the two on Hugging Face. 

  • GPT-OSS-120B: The larger of the two models offers configurable levels of reasoning and is powerful enough to rival proprietary models such as OpenAI's o4-mini. But it's efficient enough to run on a single 80GB GPU in a data center. It's considered incredibly fast and compute-efficient. It's so efficient that it can even run on a MacBook Pro that has enough memory. 

When The Deep View met with the team at OpenAI behind the GPT-OSS models and asked why the company's open models don't get more attention, the team shared a key insight. The broader open-model community tends to favor models that are easy to fine-tune with supervised learning. GPT-OSS behaves more like an OpenAI reasoning model with web search, which makes it more capable but harder to customize. It rewards reinforcement learning instead.

Also, not to be forgotten, three months after releasing these two open models, OpenAI released two more aimed at safety: GPT-OSS-Safeguard-120B and GPT-OSS-Safeguard-20B. These models let you bring your own safety policies and enterprise guardrails, enabling auditable safety behavior.

It's a little puzzling that the GPT-OSS models aren't better known. Even more recent open models, like Nvidia Nemotron and Google Gemma models, have been getting much more attention. And of course, the Chinese model makers, such as DeepSeek, Qwen, Kimi and Z.ai, are perhaps the best known in the open model space. But with AI costs soaring as agents gobble up so many tokens, and with a shortage of data center compute to meet AI's growing demands, I expect we'll hear a lot more about running AI workloads locally, since it can save on both costs and compute. Open models play a key role in the world, and the GPT-OSS models could get their moment in the sun.

Jason Hiner, Editor-in-Chief

TOGETHER WITH GRANOLA

Real Conversations = Rich Context

By now, you almost certainly know how much all of us at The Deep View love Granola, the notetaking app that saves us around 10 hours per week per person. But their latest update, Spaces, is taking that seamless collaboration and documentation to the next level… and we’re experiencing it firsthand. 

Essentially a team workspace with folders and chat built in, Spaces uses your conversations to give context to any question your team asks. From sales asking “Why are we losing this deal?” to researchers wondering “What are users consistently asking us for?”, you can ask anything and Granola will read all of the Spaces content to immediately give you an answer. 

AI FOR WORK

3 ways to write AI prompts to get better results

AI chatbots arrived with a simple promise: talk naturally, get answers. But users quickly discovered that prompting AI to get the results they actually wanted was far more complicated.

That skill gap became so valuable that an entirely new role emerged, the prompt engineer, commanding surprisingly generous salaries. As the technology matured and more people learned the basics, that specific title faded. But the skill itself didn't become less important — it became more so.

When anyone can generate a generic prompt, those who know how to extract real value from AI are the ones with the competitive edge.

“The users who understand its nuances, abilities, and limitations will be the ones who unlock real value from it, while the rest will be stuck with generic outputs and false confidence in them,” Eric So, distinguished professor of global economics, behavioral science and management at MIT Sloan, told The Deep View. 

There are proven practices for getting the best results from AI, and after speaking with the experts, we've rounded up the most useful ones below.

  1. Include lots of context: AI doesn’t know you and your preferences the way a human does, so you need to frontload as much information as you can. 

  2. Include examples: Sometimes, AI can learn to complete a task more effectively by mimicking an example than by following instructions. 

  3. Iterate: “Don't try to get it perfect on the first try. The best results usually come from an iterative back-and-forth — ask, see what you get, then refine,” Neel Joshi, director of product management for Gemini Apps, told The Deep View.

If you've tried your hand at including as much context as possible to steer the model, but you're still not getting the results you want, you can also have the model interview you to gather the information it needs to fill in the gaps. 

“You can provide all the information up front, or you can use a flipped interaction where the model interviews you interactively to elicit the information it actually needs,” Jules White, Ph.D., professor of computer science at Vanderbilt University. “That second approach is powerful because the AI can adapt its questions dynamically and uncover gaps or requirements you may not have anticipated yourself.”

LINKS

  • ChatGPT Personal Finance: OpenAI has launched personal finance tools for ChatGPT that allow you to securely connect your bank account. 

  • YouTube Likeness Detection: The social media company has expanded its facial scanning tool to anyone 18 years or older. 

  • OpenHuman: An open source AI harness tool that’s simple and powerful. 

  • Grok Build: xAI has rolled out its own AI coding agent in a bid to catch up with Anthropic.

GAMES

Which image is real?

Login or Subscribe to participate in polls.

A QUICK POLL BEFORE YOU GO

Do you utilize open source AI or tools in your workflows?

Login or Subscribe to participate in polls.

The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

“[This image] is too dark and underexposed.”

“Not as beautiful and perfect as [the other image].”

“The variation in the deck railing. The small sign. ”

“Light reflection from the building is more realistic.”

“[This image] was too pretty and well-lit.”

“Its the way A.I. casts the light. Too much glow. Closer to a dream then real life.”

“The cobblestones in [this image] do not show proper wear.”

AI tends to use a more reduced color palette. Real life has variety.”

If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.