The Deep View
Posts
MBZUAI's PAN Arcadia redefines the world model

MBZUAI's PAN Arcadia redefines the world model

The Deep View
April 29, 2026

Welcome back. Announced today, MBZUAI’s PAN Arcadia is a new kind of world model built for real-time interaction. It signals a shift from passive simulation to active, real-time environments, and hints at how world models and LLMs could evolve together into a new AI stack. —Jason Hiner

1. MBZUAI's PAN Arcadia redefines the world model

2. A world model that can react and not just simulate

3. A future where world models meet LLMs

PRODUCTS

Pan Arcadia recasts what world models can do

World models have become one of the most closely watched trends in AI this year. But most of the attention has centered around how realistic they look. MBZUAI is pushing a different question: What if they could generate visuals in real time as you interact with them?

On Wednesday, the Institute of Foundation Models (IFM) at MBZUAI released PAN Arcadia, a research preview of a real-time interactive world model. Instead of optimizing for cinematic fidelity in a fixed clip, the system is designed to respond to user input as it happens, turning a static image into something closer to a navigable environment.

“You can imagine it could be something like a game,” Zihan Liu, technical lead at IFM, told The Deep View. “You can really control the character inside the game and how the game progresses… You can just upload the image and then take control.”

In the current preview, users choose from a curated set of images and steer an agent through the scene using simple keyboard controls. As the user moves, the model keeps generating the surrounding environment, attempting to maintain a coherent world in response to each movement. Sessions are currently capped at 30 seconds, and broader capabilities, including user image uploads, are planned for future releases.

But it's easy to imagine a future where you can upload photos of your favorite places and memories to turn them into fully interactive 3D environments. That alone is likely to pique the interest of plenty of people.

That interaction loop points to a shift in how world models are being developed. Much of the field has optimized for visual coherence and generation quality. PAN Arcadia explores a different tradeoff. According to MBZUAI, the model’s visual performance is comparable to other frontier-class world models, but it prioritizes responsiveness and continuity under user control over isolated, polished demos.

TOGETHER WITH NEBIUS

Run open-source LLMs in real production

Capture live traffic, fine-tune and optimize, then deploy your own checkpoints to dedicated GPU endpoints.

Choose hardware, set scaling limits, and select a region. Enjoy stable latency, predictable costs, and clear data residency.

From LLM to production system, all in one platform

PRODUCTS

A world model that can react and not just simulate

Most world model demos follow a familiar pattern. A prompt goes in, a clip comes out, and the system is judged on how convincing that clip looks. PAN Arcadia changes the terms of the test. The question is no longer whether a model can generate a compelling scene once. It is whether it can sustain a coherent world while a user is actively pushing it in different directions.

That is a harder problem. The model is no longer optimizing for a single output. It has to maintain continuity across a sequence of actions that it cannot fully predict. Each movement forces the system to update the scene and generate a plausible continuation in real time. The experience depends as much on stability as it does on visual quality. If the world breaks under interaction, the illusion collapses.

To support that, IFM trained PAN Arcadia on footage with strong, intentional motion, including city walks and synthetic data where an agent moves through environments with a clear objective. This shifts the learning problem. Instead of focusing on how scenes look, the model learns how scenes change. That distinction becomes critical once a human is in the loop.

One of the more interesting signals is how the model handles variation. In examples shared with The Deep View, even simple hand-drawn images could be turned into navigable environments that preserve their underlying structure. That suggests the system is picking up on spatial relationships and transitions that carry over across styles, rather than just reproducing a specific visual pattern.

At the same time, MBZUAI is explicit about what the system is not doing. PAN Arcadia does not attempt fine-grained motor control or detailed physical simulation. It does not model joint angles or precise movement. Instead, it operates at a higher level, focused on cause and effect, navigation, and decision-making. In that sense, it feels less like a game controller and more like a planning environment. That boundary matters because it defines where the model fits in a larger stack.

TOGETHER WITH DEEL

Where AI delivers in global HR (and where it falls short)

AI is reshaping HR operations, but global complexity adds a layer most guides ignore. Expanding into new markets means navigating payroll rules, local labor law, and compliance risk across every region you operate in—and generic AI tools weren't built for that.

Deel's The Role of AI in HR for Global Organizations covers where AI drives real value in global HR, the governance controls you need, and what should always stay human.

Download the free guide

PRODUCTS

A future where world models meet LLMs

That positioning connects directly to one of the more practical arguments for world models: their role as training environments. Building real-world training setups for physical AI is expensive and constrained. Systems fail repeatedly in the early stages, creating costs, risks, and limited opportunities for variation. Even well-resourced teams struggle to get exposure to the diversity of environments needed for robust performance.

Virtual environments offer a way around that constraint. If a model can turn an image into an interactive space, it becomes possible to generate many variations of that environment and run repeated experiments without the physical overhead. Agents can explore more scenarios, fail safely, and accumulate experience at a scale that is difficult to match in the real world.

“Building that very diverse training ground is almost impossible, [but] it’s totally possible in the virtual world,” Liu said.

PAN Arcadia is an early attempt to make that idea concrete. It is not a full training system, but it lowers the barrier to creating interactive environments that agents can operate inside. A single scene can be explored in multiple ways, creating a broader distribution of experiences from limited input. That is the core promise behind world models as a training layer.

It also reflects how MBZUAI is structuring its broader research effort. Alongside PAN, the institute is developing a reasoning model called K2 Think. It's one of the few labs that have developed both a reasoning model and a world model. The working thesis is that these systems need to evolve together. The reasoning model handles planning and decision-making. The world model provides an environment where those decisions can be tested and refined. Each improves the other over time.

That loop is still early, but it points to a larger move. Instead of treating models as standalone tools, the focus moves toward systems that combine reasoning and simulation in a continuous cycle. In that context, PAN Arcadia is less interesting as a product than as an indicator of where world models and LLMs could converge.

LINKS

Google signs deal with DOD for use of Gemini models
Amazon and OpenAI deal makes OpenAI models available via AWS
Following China's ban, Meta prepares to undo Manus AI acquisition
Study shows 35% of newly published websites since ChatGPT launch are AI-generated
Taylor Swift files to protect voice, likeness from AI misuse
Musk, Altman trial begins in federal court in Oakland, California

Talkie: Researchers launch a 13B vintage language model from 1930
Claude: New creative work tool connectors include Adobe for creativity, Affinity by Canva
YouTube: Google is testing an AI interactive search feature for premium subscribers
Claude Code: Can send push notifications to your phone when a long task finishes
Exa: Partnered with Google to offer Grounding With Exa inside of Gemini models

Georgetown University: Frontier AI Research Lead
DataCamp: Principal AI Engineer - AI Tutor
PWC: AI-Native Engineering Lead, Full Stack - Manager
Columbia University: Staff Associate - AI in Business Initiative

GAMES

Which image is real?

Option A | Option B

POLL RESULTS

Have you seen The AI Doc, the new film about the AI industry?

Yes (6%)
No (89%)
Other (5%)

The Deep View is written by Nat Rubio-Licht, Sabrina Ortiz, Jason Hiner, Faris Kojok and The Deep View crew. Please reply with any feedback.

Thanks for reading today’s edition of The Deep View! We’ll see you in the next one.

Link to the original

“Can see different color repair blocks in [this] photo.”

“The steps/supports in [this image] seemed like the kind of thing that centuries-after-the-fact architects might have done to reinforce old buildings.”

“The shadows within the archways looked more realistic, not something AI would throw in.”

Link to the AI image

“Lighting was too monotone on [this] image.”

“[On this image, it] looked like aging is actually too even, too regular.”

“AI is trying too hard with lots of detail… The simple photo is more convincing.”

If you want to get in front of an audience of 750,000+ developers, business leaders and tech enthusiasts, get in touch with us here.