• The Deep View
  • Posts
  • 4 big questions for Apple's AI prove-it moment

4 big questions for Apple's AI prove-it moment

Welcome back. A fascinating project powered by OpenAI forward-deployed engineers offers a glimpse of what may come after today's AI agents: systems that learn from their mistakes and steadily improve their own workflows. We also examine how physical AI is moving from sci-fi to the next platform race, and the stakes are bigger than robots that can fold laundry. And at WWDC 2026 on Monday, Apple gets another chance to prove it can reimagine Siri for the age of generative AI. The bigger question is whether Apple will catch up or differentiate. Jason Hiner

IN TODAY’S NEWSLETTER

1.  4 big questions for Apple's AI prove-it moment

2. Self-improving agents are AI's next act

3. Can open-source robotics win the physical AGI race?

CONSUMER

4 big questions to watch at Apple WWDC 2026

Okay, Apple, let's try this again. 

The company had its big AI reveal two years ago at WWDC 2024, but it got too far over its skis. It announced a slew of AI products and features that were delayed, never came to market, or were disappointing.

On Monday, at the keynote for WWDC 2026, we're going to hear Apple's new vision for integrating AI into the iPhone and the rest of the Apple ecosystem. This time we expect to hear about features that are much more fully baked, including the long-awaited transformation of Siri

Apple's AI reboot comes almost a year after CEO Tim Cook rallied the company around playing a key role in the AI revolution, calling it "as big or bigger" as the internet and smartphones. "Apple must do this. Apple will do this. This is ours to grab. We will make the investment to do it," Cook told Apple employees at an all-hands meeting in August 2025, according to Bloomberg sources.

With Apple's AI trajectory on the line, here are the biggest questions to track at WWDC:

  • Can Siri meet heightened expectations? Siri needs to become nearly as capable as ChatGPT and Claude and it needs to emerge as a standalone app that you can chat with. Turning Siri into an app that's highly usable and easy to navigate is Apple's bread and butter, and it's the best way for Apple to distinguish itself. But that will only be possible if Siri works as well as the leading chatbots on the market. Apple has enlisted Google Gemini to power the Apple Foundation Models that run Siri. Let's see if Siri merely catches up to what everyone else is doing or if it launches any truly unique features.

  • Where do Google, OpenAI, and Anthropic fit in? By not making its own frontier models, Apple has the opportunity to play a role similar to Perplexity where it can use models from multiple labs and orchestrate which model is the best to use based on the user's question or task. We already know Apple is partnering with Google and OpenAI, the big question is whether Anthropic models will join the party. Based on this tweet, it looks likely. Will they all have equal footing inside the new Siri? Let's see how that plays out.

  • How will Apple integrate agents? Apple catching up on chatbots is one thing, but the AI industry is rapidly moving away from chatbots and embracing AI agents as the next stage of AI. The Information reported that Apple will launch an AI agent marketplace on the App Store to allow companies and developers to launch agents that strictly adhere to Apple's privacy and security policies. That could be a win.

  • Will there be a hardware surprise? There is no hardware currently rumored to launch at this year's WWDC, where Apple typically launches at least one new device, usually a Mac. However, we do occasionally get surprised at WWDC by hardware that flew under the radar. If that were to happen this year, the best candidates would be a new smart speaker running the overhauled Siri, a teaser for Apple smart glasses coming next year, or new Mac mini and Mac Studio hardware powered by the M5 chip. Still, all of these look like longshots.  

Let's also remember that no matter what happens with Siri, Apple is still winning big in AI with Apple silicon, as I recently detailed in an exclusive interview with Apple.

It's easy to get stuck on Apple falling behind in the chatbot race. After all, it's been two years since the promise of a revamped Siri running on AI has failed to materialize. And there were many years before that when Siri languished in a sea of unfulfilled expectations. Siri needs to deliver this time, and with all the attention Apple has dedicated to the Siri upgrade over the past year I think we can reasonably expect that the AI assistant will catch up on basic features. The bigger question is if Apple simply tries to deliver basic features and win on interface and integration, or if it tries to push the boundaries and deliver truly unique AI features. The Deep View's Sabrina Ortiz and I will be on the ground at WWDC 2026 to find out. You can follow my analysis of everything announced in real-time on X/Twitter at x.com/jasonhiner

Jason Hiner, Editor-in-Chief

TOGETHER WITH MODE MOBILE

Buffett's Rule Could Make This Company Soar

Warren Buffett famously said that "If you don't find a way to make money while you sleep, you will work until you die." But what if your phone could do it for you? 

That's exactly what Mode Mobile has created: technology that turns idle phone time into passive income. With 490M+ users in their ecosystem and $1B+ in earnings and savings, their EarnPhone is being called the Uber of smartphones.

With 32,481% revenue growth in three years, they were named the fastest-growing software company in 2023 by Deloitte, and with 7 billion smartphones worldwide their market could be significantly larger than Uber’s. They’ve just secured their Nasdaq stock ticker $MODE, and you have a limited opportunity to invest in their pre-IPO offering at $0.52/share.

RESEARCH

Self-improving agents are AI's next act

Thrive Holdings used AI to cut tax document prep time by a third while maintaining up to 97% accuracy. But the real story is what happened next: The system kept improving itself.

OpenAI forward-deployed engineers (FDE) worked with Thrive Holdings’ engineers to build Tax AI, an agent that helps automate the process of preparing 1040 and 1041 tax returns. Initially, Tax AI was built for simpler work, such as ingesting W-2s and 1099s, but as the tax season went on, it was able to self-improve and handle more complex tasks, Arthur Fernandes Araujo, an OpenAI FDE lead on the project, told The Deep View. 

"If you think about AI as an ever more capable coworker that is augmenting you, you want a coworker that is capable of —given some input on what it did incorrectly — having the memory of it and not repeating the error," Araujo said. 

OpenAI quantified those improvements by measuring the percentage of tax returns with accurate field completion, meaning all boxes on the document were filled out correctly. At launch, only a quarter of tax returns had 75% or higher field completion. Within six weeks, 86% of returns hit that mark. Eventually, the system grew even faster, with 90% of returns hitting 100% correct field completion, OpenAI said in a release. 

So how does it work? To get better, the system uses a three-part loop: 

  • First, a practitioner reveals errors and steers what the product learns

  • Then the system tracks the full process beyond inputs and outputs to convert the corrections into evals

  • Finally, a Codex-driven improvement loop allows the system to build on the evals 

For instance, in the blog post, OpenAI outlined a rental property income example in which their Tax AI system must extract Schedule E fields from messy source materials, such as handwritten notes, emails, and spreadsheets, then map them to a tax engine for practitioner review. When errors are found, practitioner corrections are captured as structured data, grouped into recurring failure patterns, and fed to Codex as scoped engineering tasks. 

In this project, what’s "self‑improving" is the harness around the model, not the underlying model itself. The tax agent is built on OpenAI’s Codex harness, and it’s the harness that’s being iteratively improved based on practitioner feedback. Because the Codex harness is open source, other developers can also build similar self‑improving agents on top of it.

Also notable is that the same three-part design can be applicable for workflows in other domains and industries and is "quite generalizable," John de Wasseige, one of the FDE leads on this project, told The Deep View. For example, OpenAI says it is working with Thrive Holdings to apply it to accounting workflows, such as booking, audit and operational workflows.

Ultimately, one of the biggest challenges with any AI model is accuracy, a concern that becomes even more critical in enterprise use cases. For these models to earn genuine trust, they need to perform as closely as possible to a human expert. One hallmark of real-world training is the ability to correct someone's behavior so they don't repeat the same mistake. The Tax AI system appears to mimic that dynamic, and its potential to extend that feedback loop to other forms of knowledge work makes it a noteworthy advancement in the space. OpenAI said the broader purpose of publishing the article was to provide developers with a blueprint for building on these ideas and a constructive mindset to push the technology forward.

TOGETHER WITH MODE MOBILE

The Next Uber? This Shark Tank Investor Thinks So

Imagine turning down Uber at a valuation of $10 million, only to watch it go public at over $80 billion. That’s exactly what happened to Mark Cuban. 

But fellow Shark Tank investor Kevin Harrington may have learned from Cuban’s mistake, investing early in the Uber of Smartphones.

Mode Mobile turns smartphones into income-generating assets, like Uber did with cars. They’ve generated $115M in revenue and helped users earn and save +$1B. 

GOVERNANCE

Can open-source robotics win the physical AGI race?

The robotics industry has a specific dream with far-reaching consequences: a robot brain that can do anything a human asks it to. 

It's the driving force behind Generalist, a company with a goal of creating "physical AGI" that's useful to anyone. Last week, the Nvidia-backed startup announced a $400 million funding round, bringing its valuation to $2 billion. The startup joins the likes of Physical Intelligence, Rhoda AI, and Skild AI in attracting high-profile investors to the ambitious mission of creating robots capable of learning tasks like humans. 

But one startup may want to change the nature of this race, shifting the work from competition to collaboration. Luma, a video AI and world model startup aimed at creating "multimodal AGI," last week announced the Open Physical AI Lab, a collaborative initiative aimed at solving generalization to "maximally benefit all of humanity," CEO Amit Jain said in a post on X.

Luma and other startups are deeply focused on solving this because the scale of generalized robots could be massive. In an interview, Jain told The Deep View that if generalization is achieved and robots are capable of operating alongside humans at work, in businesses and at home, "they're going to become the means of production." 

But generalization is an incredibly hard problem to solve, and a lot stands in the way of achieving it. One of the most notable challenges is the massive gap in real-world data needed to train robotics models to interact with the real world. Additionally, these models require far more compute than conventional language models. 

Opening this research to a wider audience could accelerate innovation and close those gaps more quickly. And without pooling resources, "robotics and generalization cannot be solved," Jain told The Deep View. 

However, Jain said there may be a greater risk of this work being siloed within individual companies. The race to generalization in physical AI mirrors that of the race to build AGI at Google, OpenAI, and Anthropic. If the work isn't done in an open-source manner, Jain said, the industry risks the centralization of this technology. 

And if these robots reach the potential the industry is expecting, "There is no situation where we imagine a future where the means of production are controlled by one or two or three companies," said Jain. "Groups and institutions and nations should be in control of means of production."

It's clear that physical AI is one of the tech industry's next frontiers, with the starry-eyed goal of creating generalized robots that can perform whatever human-like tasks we ask them to. It's why these young startups are notching billion-dollar valuations from major investors. It's also why OpenAI has taken an interest in robotics, seeking to recruit engineers to its own robotics division. However, Jain's warning is also clear: Centralizing power in robotic generalization could be as risky, if not riskier, than concentrating powerful AI in the hands of too few companies. And while OpenAI's Sam Altman and Greg Brockman have both recently called for the democratization of the technology, with a major OpenAI IPO on the horizon and so much money on the line, it's important that governments, institutions and the global community continue to hold these companies to the promise that this technology be developed in a way that will broadly benefit humanity. 

Nat Rubio-Licht

LINKS

  • OpenAI Lockdown Mode: An optional OpenAI security setting to prevent prompt injection and data exfiltration attacks

  • Manus Shopify Connector: The AI agent app now lets owners build storefronts, manage product catalogs, and generate campaigns using chat. 

  • Dreambeans: A new product from Google Labs that gives users proactive, personalized collections of stories. 

  • Perplexity: Nvidia's open source Nemotron 3 Ultra is now available for Perplexity Pro and Max subscribers on the flagship platform and Computer.

GAMES

Which image is real?

Login or Subscribe to participate in polls.

A QUICK POLL BEFORE YOU GO

Would you trust AI agents to handle finance tasks, such as tax preparation?

Login or Subscribe to participate in polls.

The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

“[This image] because of the patio door. It is a more realistic window shape.”


“The sunlight areas in the interior are over exposed. I don't think AI would make this error.”


“More shadows and reflections on the window - not so perfect.”

“The shadows in [this image] didn’t look right.”


Shadows on the floor looked off and the greenery out the window on the left side was blurred in a way I didn't see a reason for.”

“A main prop in [this image] seemed out of focus.”

If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.