This UAE AI model might finally teach machines how the world works

MBZUAI’s Institute of Foundation Models introduces PAN, a model that keeps video scenes consistent over time using a Generative Latent Prediction framework. Benchmarks highlight planning and forecasting gains for robotics and autonomous systems.

MBZUAI unveils PAN, an interactive world model

MBZUAI’s Institute of Foundation Models has announced PAN, a step toward interactive world models that reason about change rather than just generate pretty clips. I

t follows instructions like “drive through a snowy forest,” keeps objects and motion consistent, and links moments together so they make sense over time. The university says this could help robotics, autonomy and decision-support research that depend on understanding cause and effect.

MBZUAI’s IFM introduced PAN, a model that simulates and understands how the world changes over time.
PAN follows natural-language instructions and keeps scenes coherent across longer timelines.
It runs on a Generative Latent Prediction (GLP) framework that separates “what happens” from “how it looks.”
Benchmarks show state-of-the-art results among open-source systems for action simulation, long-horizon forecasting, and simulative planning.

What PAN is trying to solve

PAN aims to move past short, isolated video generation by modelling how scenes evolve, not just how they look at a single moment.

Focuses on continuity across time
Follows plain-English instructions
Keeps objects, motion, and scene logic aligned

Most video tools today output a few seconds that look fine, but don’t connect to what came before or after. PAN goes after that missing thread. It combines visual understanding with reasoning so an instruction like “walk toward the lighthouse” unfolds as a sensible sequence rather than a one-off clip.

For research and real-world systems, that kind of temporal coherence is the difference between a demo and something you can actually use.

How PAN keeps scenes consistent: GLP in plain English

At the core is Generative Latent Prediction (GLP). The model builds an internal memory of the scene, then renders it to short clips in a loop.

Forms a latent state that remembers what exists and how it moves
Decodes that latent state into a brief video segment
Repeats the predict-and-decode cycle to maintain causal links over time

Think of GLP as splitting the job into two parts. First, PAN tracks the “story so far” in a structured latent state: objects, layout, motion, and the instruction you gave it. Then it turns that memory into a short video clip. By repeating this cycle, the visuals remain coherent and causally connected, so actions in frame one influence what you see ten seconds later.

What the early results say

MBZUAI reports that PAN reaches state-of-the-art performance among open-source systems in three areas.

Action simulation fidelity
Long-horizon forecasting
Simulative reasoning and planning

Action fidelity is about whether the model’s simulation of movements is believable and consistent. Long-horizon forecasting checks if it can predict what should happen further out, not just next frame. Simulative reasoning and planning assess whether the model can follow a plan and understand consequences over time.

Hitting those marks matters for labs building agents, for developers shipping autonomy features, and for teams that want decision-support systems grounded in physical plausibility.

Why it matters for the UAE and beyond

PAN was introduced in Abu Dhabi on 13 November 2025, led by MBZUAI’s Institute of Foundation Models. IFM says its goal is to build rigorous, open foundation models with teams across Abu Dhabi, Paris and Silicon Valley.

Signals continued investment in practical AI research in the UAE
Links to MBZUAI’s wider push on efficient models and agentic systems
Connects to an international network of researchers

The region is already leaning into agentic and efficient AI work, from initiatives like K2 Think to cloud infrastructure plays. PAN fits that picture by targeting the reasoning and planning gap that sits between flashy demos and systems you can trust. For recent context, see our coverage of K2 Think and the UAE’s growing AI stack.

Where to learn more

MBZUAI points to three resources for readers who want to dig into PAN.

As the materials roll out, expect more technical detail and examples. We’ll update this piece as public artefacts become available.

FAQ

What is PAN in simple terms?

It’s a model from MBZUAI’s IFM that understands and simulates how a scene changes, keeping video sequences logical over time while following natural-language instructions.

How is PAN different from typical video generators?

Most tools create short, isolated clips. PAN maintains continuity and causal links across time, so sequences feel connected and responsive to your instruction.

What does “Generative Latent Prediction” mean?

PAN stores a structured memory of the scene (the latent state), predicts how it should evolve, then decodes it into short video segments. Repeating this keeps visuals coherent over longer horizons.

What can PAN be used for?

The reported strengths point to robotics, autonomous systems, and decision-support research, where modelling actions and consequences matters.

Who’s behind PAN?

MBZUAI’s Institute of Foundation Models, with teams spanning Abu Dhabi, Paris and Silicon Valley, focused on scientifically rigorous and open foundation models.