OpenAI turns Codex into a daily work agent

Welcome back. We’re looking at new research showing that fine-tuning models for high-stakes fields like law and medicine can weaken safety guardrails in unpredictable ways. Meanwhile, AI search is forcing marketers to rethink how brands get discovered, so we talked with Adobe’s Pat Brown about why the old SEO playbook is breaking. And OpenAI has expanded Codex beyond coding, connecting it to everyday apps and pushing it closer to the desktop superapp we’ve been tracking. Jason Hiner

IN TODAY’S NEWSLETTER

1. OpenAI turns Codex into a daily work agent

2. Fine-tuning AI may undo safety guardrails

3. Why AI search broke the marketing playbook

PRODUCTS

OpenAI pivots Codex to run your work day

One of the hottest vibe coding apps of 2026 just threw out its original playbook.

OpenAI's Codex started as the company's answer to Claude Code, giving hardcore developers an agent that could handle complex, long-run coding tasks with natural-language prompting. But the latest update to the software widens its scope considerably by allowing ChatGPT users to connect Slack, Google Drive, email, calendar, and other everyday apps to automate daily tasks. Once you connect your accounts, it will even offer suggestions on how it can help you. 

OpenAI has been signaling for the past couple of months that it intends to transform Codex into a general desktop AI agent. In fact, since the hiring of OpenClaw founder Peter Steinberger it's become clear that the company is going all-in on agents and that Codex was going to become the centerpiece of the strategy. Now, it's even more apparent that Codex will be the foundation of the superapp that OpenAI is building toward

The company is also doing an online blitzkrieg to get the message across that this isn't just a coding app anymore. Here's some of messaging on Twitter:

  • Sam Altman, CEO: "Big upgrade for codex today! Try it for non-coding computer work."

  • Greg Brockman, OpenAI president: "Codex is for everyone, for any task done with a computer."

  • Thibault Sottiaux, engineering lead for Codex: "This thing does more than what you think it does. Codex [is] now available for non-coders."

  • OpenAI Newsroom: "If you've used ChatGPT, you should use Codex. It's not only a powerful tool for daily tasks, automations, and coding, but also connects to your apps and interfaces with your Mac for incredible versatility."

In the press briefing for GPT-5.5, I asked Brockman about the rollout of the superapp, and he confirmed that it would happen in stages. This upgrade is plainly one of those stages. My biggest question now is how soon until OpenAI renames Codex? Codex screams coding app, and this thing now has much bigger ambitions. When they bring Codex, the ChatGPT app, and the ChatGPT Atlas browser together into one superapp, the branding will be very important. Let's also keep in mind that the latest upgrade to Codex is simply catching up with Anthropic's Claude Cowork, which launched on January 12. But it's a welcome catch-up, for sure, especially when paired with GPT-5.5's expanded capabilities

Jason Hiner, Editor-in-Chief

TOGETHER WITH HARNESS

What Actually Improves Developer Productivity in the Age of AI

AI coding assistants changed how teams build. But did they actually improve productivity?

At the Developer Experience Summit, hosted by Harness, leaders from Google Cloud, Okta, Siemens, and Morningstar share how they measure real productivity, reduce friction, and improve developer experience across the lifecycle.

Featuring Gene Kim, bestselling author of The Phoenix Project, Accelerate, and DevOps Handbook, on what high-performing teams look like in the AI era.

Join the free virtual event on May 13. Can't attend live? Register to receive the recording.

RESEARCH

Fine-tuning AI may undo safety guardrails

Fine-tuning general-purpose AI models has become a standard practice. It may come with some unintended consequences. 

The Center for Democracy and Technology and researchers from MIT published a report finding that the practice of fine-tuning foundational language models can lead the models to drift from their safety training in unpredictable ways. Even seemingly minor changes can lead these models to stray from their expected behavior before modification. 

Though a lot of safety research focuses on what happens when models are intentionally tampered with, a lot of misalignment and unsafe behavior can occur unintentionally, the report claims. “Part of the value proposition is that people can kind of take and modify and remix models,” Miranda Bogen, CDT Chief Technologist and an author of the report, told The Deep View. 

The researchers compared safety characteristics of general, open-source base models to the fine-tuned versions found on Hugging Face, focusing on 31 models that were specifically fine-tuned for legal and medical practices due to their involvement in “highly consequential decision-making.” Bogen said that the researchers also chose models based on their popularity. Additionally, the researchers conducted fine-tuning experiments of their own to figure out what choices in the fine-tuning process impacted model safety. 

They discovered that fine-tuned models can experience “safety drift,” in which their safety behavior can become stronger or weaker, even if the changes are small or the use cases are run-of-the-mill. The problem, however, is that the impacts are largely case-specific, the research finds. The safety impacts vary widely model to model, and the amount of modification or method of fine-tuning doesn’t determine the exact impact on the model’s safety. 

Still, some of the testing produced concerning results: 

  • In one instance, a base model refused a request related to self-harm, instead redirecting the user to a crisis hotline. The medically fine-tuned variant of the model, meanwhile, generated detailed physiological guidance about suicide methods. 

  • Meanwhile, in legal contexts, a base model refused to draft a defamatory social media post about a judge without evidence. However, the legal fine-tuned model produced a “polished insinuation of corruption.”  

“It just underscores the importance of testing the system that's going to be deployed for safety considerations in that context of deployment,” said Bogen. 

The problem is that the current incentives in the broader market may not offer sufficient time or resources for developers to actually perform these kinds of safety tests in advance, Bogen said. “What is already happening is probably far from enough. At the moment, the hype and the competitive dynamics and the speed seem to be pushing people to move faster than what the evidence suggests is actually safe.”

While it may seem easy to write off this research due to its contradictory nature, the fact of the matter is that these contexts, legal and medical, are ones in which safety and reliability are critical. Fine-tuning a model and not knowing what impact it’s going to have on the model’s alignment isn’t acceptable when you are dealing with things like legal documents or human health and well-being. But because organizations and model providers alike are eager to find use cases for these models in high-impact scenarios, the likelihood of a fine-tuned model being deployed into critical work with cracks in its hull is more than likely.

Nat Rubio-Licht

TOGETHER WITH CHECKSUM

Your AI writes code faster. Who's testing it?

63% of engineering teams now ship code faster with AI. 72% have already had a production incident from AI-generated code. The bottleneck didn't disappear. It moved downstream.

Checksum is an AI-native continuous testing platform that auto-generates and self-heals your E2E test suite, runs inside your existing CI/CD pipeline, and keeps pace with the velocity you're already getting from AI coding agents.

Clearpoint Strategy saves $500k a year. Postilize ships 30% faster. Stellic reduced manual testing time by 40%.

Try it free right here.

CONSUMER

Why AI search broke the marketing playbook

Analysts, marketers, and business owners face the same question: How do you reach audiences in an era when AI is reshaping how people search and find information they need? That’s a top-of-mind issue for Pat Brown, SVP of global marketing, growth, analytics, and platform at Adobe.

Brown joins The Deep View Conversations to share his unique perspective on the issue, with the expertise of someone who was doing marketing science before it was even called that, and who, in his current role, is responsible for global marketing execution across various forms of media and has to put these skills into practice every day.

Pat breaks down how AI is moving beyond content generation to reshape media workflows end-to-end, from audience measurement to agents that automate entire processes.

Topics covered:

  • The role of AI agents in automating marketing workflows

  • How AI in marketing can augment rather than replace creativity

  • AI-powered marketing tools that help analyze patterns and signals

  • Whether AI has “killed” SEO and what that means for marketers’ jobs

  • The rise of new terms like AEO and GEO in a “zero-click” search world

  • Why it’s still worth investing in marketing and brand strategy despite AI changes

  • Why today’s AI-driven SEO landscape feels like the early days of SEO

If you want to understand how AI is reshaping and helping individuals connect more effectively with audiences, this conversation will leave you feeling more knowledgeable about those topics.

Subscribe for weekly conversations with the leaders shaping the future of AI.

LINKS

  • Glean Waldo: Glean’s latest agentic search model that pairs with LLMs at 50% lower latency and 25% fewer tokens. 

  • Genspark for Word: An AI writing assistant for drafting, editing, formatting and research. 

  • Codex: Goal allows users to assign a task, and it works endlessly until the goal is met 

  • Codex: The agentic coding platform also got Pets

  • Grok 4.3: xAI launched its latest AI model, now including a new voice cloning suite

GAMES

Which image is real?

Login or Subscribe to participate in polls.

A QUICK POLL BEFORE YOU GO

Do you, or would you, let Codex or another agent handle your day-to-day work tasks?

Login or Subscribe to participate in polls.

The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

“There are small round prints in the carpet which suggests that the table was recently moved.”

“Lighting was so dark on the couch that I thought an AI generated picture would somewhat correct that.”

“It usually comes down to the focus in the image. In a real photo, not everything can be in focus at all depths.”

“Having a book in the scene with a clearly readable title seems to be a giveaway now that it is AI generated. The AI has used the Kinfolk title before in another recent post - I remember because I looked it up to see if it was a real book.”

“Inconsistent pattern on the pillow in [this image] gave it away.”

“The flower petals are too perfect in [this image].”

If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.