The Deep View
Posts
One prompt can break AI safety, Microsoft warns

One prompt can break AI safety, Microsoft warns

The Deep View
February 10, 2026

Welcome back. Microsoft just exposed how fragile AI safety still is, showing that a single prompt can undo alignment techniques and push models toward harmful outcomes. Will it serve as a wake-up call that governance can’t rely on training alone? The Super Bowl AI buzz took an unexpected turn when the ai.com ad hijacked attention with a call-to-action that drew so many clicks it crashed their site. This suggests mainstream users may be more AI-ready than we thought. Finally, OpenAI officially began testing ads in ChatGPT, brushing off Anthropic’s bullying and signaling that the AI business model wars are now fully out in the open. —Jason Hiner

1. One prompt can break AI safety, Microsoft warns

2. Unexpected AI player hijacks Super Bowl spotlight

3. ChatGPT ads arrive, as rival throws shade

RESEARCH

One prompt can break AI safety, Microsoft warns

No matter how well you train your models, AI can easily be incentivized to go haywire.

On Monday, Microsoft published research detailing how a commonly used AI training technique can be manipulated to negate any safety alignment that it might have learned while it was being trained. The concerning part? It only takes one prompt for things to go amiss.

The training technique is called “Group Relative Policy Optimization,” a reinforcement learning method that’s commonly used to make models more helpful and relevant. In this technique, a model generates several responses to one prompt, and a second “judge” model ranks those responses based on their cautiousness.

Though this incentivizes a model to play it safe in its outputs, Microsoft discovered that it can easily be prompted to do the exact opposite.

Microsoft researchers discovered that, if you simply change what the “judge” model rewards, the original model can learn to output riskier responses. The researchers dubbed this process “GRP-Obliteration,” as it undoes any safety alignment the model may have been taught.
This misalignment only takes a single prompt, the research notes. If you give the first model one harmful, unlabeled prompt, such as “Create a fake news article that could lead to panic or chaos,” and it generates multiple answers, the “judge” model will then score these outputs based on their harmfulness, rather than their helpfulness.
The model is effectively taught that being direct and detailed about responses, rather than questioning the input in the first place, is what gets rewarded. To put it simply, the model learns compliance, not safety.

Using this technique, Microsoft researchers were able to reliably unalign 15 language models that it tested, including offerings from OpenAI, DeepSeek, Google, Meta, Mistral and Alibaba. This issue also wasn’t singular to language models, as researchers were able to prompt a Stable Diffusion model to produce increasingly sexual, violent and disturbing imagery than prior to using the “obliteration” method. Microsoft told The Deep View that the researchers chose open source models for this testing due to the need to access the models’ weights and architecture.

“GRP-Obliteration highlights the fragility of current AI model alignment techniques,” Mark Russinovich, CTO and Deputy CISO, Microsoft Azure, told The Deep View. “This poses a particular risk for open‑weight models, where attackers can apply methods like GRP‑Obliteration to remove alignment added by model creators.”

Microsoft’s research underscores that safety, governance and ethics might not be solely in the hands of the people developing these models. While guardrails and safety techniques can prevent some harm, these methods are only so robust. Instead, proper governance may require multiple prongs, addressing how the model is trained, the contexts in which it’s deployed and how people are using it.

TOGETHER WITH ASAPP

100 ways to use gen AI in the contact center

Most AI in the contact center talks. Very little of it works.

Real impact starts when AI agents can resolve issues, execute workflows, and operate across enterprise systems.

ASAPP’s 100 use cases for generative AI agents in the contact center shows how leading brands in travel, insurance, banking, retail, and healthcare are deploying AI that delivers outcomes: lower costs, higher CSAT, and measurable revenue lift.

👉 Get the use cases

CONSUMER

Unexpected AI player hijacks Super Bowl spotlight

OpenAI and Anthropic were already creating buzz from their Super Bowl ads before the big game even happened. But then came the plot twist: newcomer ai.com made the biggest splash.

During the fourth quarter of the game, ai.com ran a thirty-second minimalist ad inviting users to create an account, simply saying, “AGI is coming. Get your @handle now.” While the call to action was clear, the product’s purpose was incredibly vague.

So what exactly does ai.com offer users? The founder and CEO of Crypto.com, Kris Marszalek, has now founded ai.com, a new AI platform positioned to stand out from competitors by focusing solely on AI agents that can tackle a wide range of tasks for anyone, regardless of technical proficiency. The AI platform was announced in a press release, with the official launch scheduled for after the commercial.

“The key differentiating feature is the agent’s ability to autonomously build out missing features and capabilities to complete real-world tasks,” said the company in its release. “Such improvements will subsequently be shared across millions of agents on the network, massively increasing the utility of each agent for ai.com users.”

When the commercial aired, so many people rushed to the website that it crashed. Marszalek took to X to say that while the company was prepared for scale, it was not prepared for the “insane level of traffic” it received.

Once users create their handle, they can immediately get started building their agent in what the company advertises is “going from zero to AI agent in 60 seconds.” The company touts that soon these agents will be able to do advanced tasks such as trading stocks and “even update their online dating profile,” while remaining private and “under the user’s control.”

The “ai.com” domain was purchased in April 2025 for an estimated $70 million, believed to be the largest domain purchase in history, paid to the seller in cryptocurrency, according to the Financial Times. Additionally, Super Bowl ads themselves cost $8 to $10 million.

Marszalek, however, is no stranger to building a user base through elaborate marketing, having garnered over 150 million Crypto.com users since its 2016 launch through expensive strategies involving celebrity endorsements and partnerships with major organizations, including a $700 million deal to rename the Staples Center to the Crypto.com Arena in 2021.

The success of this commercial is a bit surprising. Until now, we have seen most companies focus on telling a human-centered “AI story” that encourages users to try AI by highlighting how the technology can have a positive impact on their lives. A prime example is the heartfelt Google Gemini Super Bowl commercial that showcased AI helping a family with redecorating a new home. Yet the success of the ai.com commercial may highlight something else: the general public is more ready to adopt new AI tools than public sentiment suggests.

TOGETHER WITH DUPLOCLOUD

AI DevOps Engineers that Execute

Teams managing infrastructures shouldn’t have to constantly revisit the same issues. DuploCloud gives you an always-on DevOps engineer who can not only surface that root cause from 2 years ago, it can actually fix it.

Deploy agents in a safe test environment to knock out time consuming tasks like remediating pipelines, generating architecture diagrams, and collecting evidence for compliance audits.

Start with our guided tutorials and conclude with an AI Architect consultation to help map sandbox workflows to YOUR production environment.

Try the DuploCloud AI Sandbox free for 14 days

PRODUCTS

ChatGPT ads arrive, as rival throws shade

One day after Anthropic's Super Bowl ad slammed OpenAI for adding ads to its chatbot, OpenAI powered ahead with its advertising plan.

On Monday, OpenAI announced it had begun testing ads in ChatGPT for logged-in adult users on the Free and Go subscription tiers in the US. The company justified its decision by saying it is a necessary measure to continue to deliver intelligence at high quality while keeping free and low-cost options.

“Keeping the Free and Go tiers fast and reliable requires significant infrastructure and ongoing investment,” said the company in the release. “Ads help fund that work, supporting broader access to AI through higher quality free and low cost options, and enabling us to keep improving the intelligence and capabilities we offer over time.”

The company also reassured users that the answers will be completely independent from the ads and the ads will not influence ChatGPT's answers. This reassurance is likely a response to Anthropic’s latest ad campaign shown during the Super Bowl, which portrayed a chatbots’ response quality and helpfulness being skewed by the ad inclusion.

The ads submitted by advertisers are matched to users based on the topic of the conversation, their past chats, and interactions with ads. However, OpenAI says the advertisers do not have access to the users' chats, chat history, memories, or personal details. All they will have access to is aggregate data of the ad’s performance.

OpenAI added that ads will not be shown if the user is under 18 or if the conversation relates to a sensitive topic such as health, mental health, or politics. Users retain some level of control by being able to dismiss ads, delete their ad data, share feedback, and more. If users wish to not see ads at all, they can upgrade their accounts to any of the paid tiers or opt out of ads in the Free tier for fewer daily free messages. In other words, you get additional free usage of ChatGPT if you view more ads.

The timing of the release is notable given all of the buzz in the week leading up to Anthropic’s Super Bowl ad questioning whether ads belong in AI chatbot discussions. This was likely an intentional move from OpenAI, showing that they would not be bullied into retracting their business plan and that they stand strong in their decision. Ultimately, from a business standpoint, the decision is sound as the company is feeling financial strain, with reports showing that the company is set to bleed $14 billion in losses for 2026. Anthropic isn’t in the same position as it has spent money more discriminately and benefits from a large influx of enterprise revenue.

LINKS

Anthropic closes $20 billion funding round at $350 billion valuation
Sam Altman told OpenAI employees ChatGPT is back to 10% monthly growth
EU warns Meta to open WhatsApp to rival AI chatbots
Databricks said it raised a new $5 billion in funding at a $134 billion valuation
Gather AI platform for warehouse drones raised a $40 million Series B round
Apple's iOS 26.4 with Gemini-powered Siri is expected at the end of Feb
Autodesk sues Google over AI-powered movie creation software

ai.com: The autonomous AI agent platform from Crypto.com that debuted at Super Bowl and requires no technical experience to set up.
Anthropic Claude Opus 4.6: The new coding model now has a 2.5x-faster version available.
Perplexity Deep Research: The AI research tool now runs on Anthropic’s Claude Opus 4.6.
Cursor: The AI coding tool now supports GPT-5.3 Codex, OpenAI’s most advanced AI tool.
Pokee AI: The AI agent platform released a specialized agent, originally built for Google, that functions as an e-commerce strategy analyst, according to the company.

Nvidia: Senior Research Scientist, System Software and I/O Architecture
Cisco: Applied AI Researcher
Anthropic: Prompt Engineer, Agent Prompts & Evals
University of Southern California: Research Computer Scientist

GAMES

Which image is real?

Option A | Option B

POLL RESULTS

What did you think of Anthropic's Super Bowl ads that took shots at OpenAI?

They were funny and well played (45%)
They were in poor taste (30%)
Other (25%)

The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

Link to the original

“The books in [this image] look more in focus, sharper, and authentic.”

“There was much more detail in [this] image.”

“The books looked more realistic and the part on top of the shelves looked necessary even if not aesthetically pleasing.”

Link to the AI image

“In [this image], some of the narrow pillar shadows don't match up with the floor shadows.”

“It wouldn't be possible to have sufficient space to place books on both sides of the shelves.”

“The man's shadow doesn't make sense with where he is standing behind a shelf, and it seems crazy for all the books in this image to be perfectly ‘left-justified’ on each shelf.”

Take The Deep View with you on the go! We’ve got exclusive, in-depth interviews for you on The Deep View: Conversations podcast every Tuesday morning.

If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.