Skip to content

World Models + Planning

📋Page Status
Page Type:ContentStyle Guide →Standard knowledge base article
Quality:54 (Adequate)⚠️
Importance:62 (Useful)
Last edited:2026-01-28 (4 days ago)
Words:2.2k
Structure:
📊 20📈 1🔗 3📚 368%Score: 13/15
LLM Summary:Comprehensive analysis of world models + planning architectures showing 10-500x sample efficiency gains over model-free RL (EfficientZero: 194% human performance with 100k vs 50M steps), but estimating only 5-15% probability of TAI dominance due to LLM superiority on general tasks. Key systems include MuZero (superhuman on 57 Atari games without rules), DreamerV3 (first to collect Minecraft diamonds from scratch), with unique safety advantages (inspectable beliefs, explicit goals) but risks from reward misgeneralization and mesa-optimization.
Issues (2):
  • QualityRated 54 but structure suggests 87 (underrated by 33 points)
  • Links1 link could use <R> components
DimensionAssessmentEvidence
Current DominanceLow (games/robotics only)MuZero, DreamerV3 superhuman in games; limited general task success
TAI Probability5-15%Strong for structured domains; LLMs dominate general tasks
Sample Efficiency10-500x better than model-freeEfficientZero: 194% human performance with 100k steps (DQN needs 50M)
InterpretabilityPartialWorld model predictions inspectable; learned representations opaque
Compute at InferenceHighAlphaZero: 80k positions/sec vs Stockfish’s 70M; relies on MCTS search
ScalabilityUncertainGames proven; real-world complexity unproven at scale
Key AdvocateYann LeCun (Meta)Argues LLMs are “dead end”; JEPA is path to AGI

World models + planning represents an AI architecture paradigm fundamentally different from large language models. Instead of learning to directly produce outputs from inputs, these systems learn an explicit model of how the world works and use search/planning algorithms to find good actions.

This is the paradigm behind AlphaGo, MuZero, and the approach Yann LeCun advocates with JEPA (Joint-Embedding Predictive Architectures). The key idea: separate world understanding from decision making.

Estimated probability of being dominant at transformative AI: 5-15%. Powerful for structured domains but not yet competitive for general tasks. MuZero (Nature, 2020) achieved superhuman performance across Go, chess, shogi, and 57 Atari games without knowing game rules. DreamerV3 (2023) became the first algorithm to collect diamonds in Minecraft from scratch, demonstrating generalization across 150+ diverse tasks with fixed hyperparameters.

Loading diagram...
ComponentFunctionLearnable
State EncoderCompress observations to latent stateYes
Dynamics ModelPredict how state changes with actionsYes
Reward ModelPredict rewards/valuesYes
Policy NetworkPropose likely good actionsYes
Value NetworkEstimate long-term valueYes
Search/PlanningFind best action via lookaheadAlgorithm (not learned)
PropertyRatingAssessment
White-box AccessPARTIALWorld model inspectable but learned representations are opaque
TrainabilityHIGHModel-based RL, self-play, gradient descent
PredictabilityMEDIUMExplicit planning, but world model errors compound
ModularityMEDIUMClear separation of world model, policy, value
Formal VerifiabilityPARTIALPlanning algorithm verifiable; world model less so
AdvantageExplanation
Explicit goalsPlanning objectives are visible
Inspectable beliefsCan examine what the world model predicts
Separable componentsCan analyze dynamics, rewards, policy separately
Bounded searchPlanning depth is controllable
Model-based interpretabilityCan ask “what if” questions of world model
RiskSeverityExplanation
Goal misgeneralizationHIGHReward model may capture wrong objective
World model errorsMEDIUMErrors compound over planning horizon
Mesa-optimizationHIGHPlanning is explicitly optimization
Deceptive world modelsUNKNOWNCould world model learn to deceive planner?
Instrumental convergenceMEDIUMPlanning may discover dangerous strategies
SystemDeveloperYearDomainKey AchievementPerformance
AlphaGo ZeroDeepMind2017GoTabula rasa superhuman100-0 vs AlphaGo Lee
AlphaZeroDeepMind2018Chess, Shogi, Go24hr to superhuman155 wins, 6 losses vs Stockfish (1000 games)
MuZeroDeepMind2020Games (no rules given)Learned dynamics modelSOTA on 57 Atari games
DreamerV3DeepMind/Toronto2023150+ diverse tasksFirst Minecraft diamonds from scratchSOTA on 4 benchmarks
EfficientZeroTsinghua2021Atari 100k500x more sample-efficient than DQN194% mean human performance

Robotics and Real-World Applications (2024)

Section titled “Robotics and Real-World Applications (2024)”
SystemApplicationKey InnovationStatus
MoDem-V2Robot manipulationRecurrent State-Space Model for collision-free planningICRA 2024
DreMaRobotic imitationLearnable digital twins with Gaussian SplattingResearch
4D Latent World Model3D robot planningSparse voxel latent space for geometric reasoningResearch
UniSimInteractive simulationReal-world simulators from videoICLR 2024 Outstanding
SystemDeveloperScopeStatusKey Claim
JEPAMeta (LeCun)General intelligenceI-JEPA, V-JEPA releasedPredicts embeddings, not pixels
V-JEPAMetaVideo understandingReleased 2024Physical world model
GenieDeepMindWorld generationResearchInteractive worlds from video
SoraOpenAIVideo predictionLimited releaseImplicit world model debate
PaperYearVenueContributionCitations (2025)
World Models (Ha & Schmidhuber)2018NeurIPSVAE + RNN architecture; training in “dreams”2,500+
AlphaGo Zero2017NatureTabula rasa Go mastery8,000+
AlphaZero2018ScienceGeneral game mastery5,000+
MuZero2020NatureLearned dynamics without rules3,000+
DreamerV32023JMLRFixed hyperparameters across domains500+
EfficientZero2021NeurIPS500x sample efficiency improvement400+
A Path Towards Autonomous Intelligence (LeCun)2022PositionJEPA framework proposal1,000+
EntityFocusNotable Contributions
DeepMindGame AI, planningAlphaGo/Zero, MuZero, DreamerV3, Genie
Meta FAIRJEPA, self-supervisedI-JEPA, V-JEPA, VL-JEPA
Yann LeCunArchitectural advocacyArgues LLMs lack world understanding
Danijar HafnerDreamer seriesDreamerV1/V2/V3 lead author
TsinghuaSample efficiencyEfficientZero, EfficientZero V2
BerkeleyRoboticsModel-based control, MPC

Yann LeCun, Meta’s VP and Chief AI Scientist, has been the most vocal advocate for world models as the path to AGI. His 2022 position paper “A Path Towards Autonomous Machine Intelligence” argues:

  1. LLMs are “dead end” - Autoregressive token prediction does not produce genuine world understanding
  2. Prediction in embedding space - JEPA predicts high-level representations, not raw pixels/tokens
  3. Six-module architecture - Perception, world model, cost, memory, action, configurator
  4. Hierarchical planning - Multiple abstraction levels needed for complex tasks

JEPA (Joint Embedding Predictive Architecture) differs fundamentally from generative models:

AspectGenerative (LLMs, Diffusion)JEPA
Prediction targetRaw tokens/pixelsAbstract embeddings
Uncertainty handlingMust model all detailsIgnores irrelevant variation
Training signalReconstruction lossContrastive/predictive loss
Information focusSurface-level patternsSemantic structure

Meta has released three JEPA implementations: I-JEPA (images, 2023), V-JEPA (video, 2024), and VL-JEPA (vision-language, 2024).

LeCun ClaimCounterAssessment
LLMs don’t understandThey demonstrate understanding on diverse benchmarksPartially valid: benchmark performance vs. true understanding debated
Autoregressive is limitedGPT-4/o1 shows complex reasoningContested: reasoning may still be pattern matching
Need explicit world modelImplicit world model may emerge in LLMsOpen question: Sora debate suggests possible emergence
JEPA is superiorNo JEPA system matches GPT-4 capabilityCurrently true: JEPA hasn’t demonstrated general task success
AspectWorld ModelsLLMsWinner (2025)
Planning mechanismExplicit MCTS search (80k pos/sec)Implicit chain-of-thoughtContext-dependent
World knowledgeLearned dynamics modelCompressed in weightsLLMs (broader)
Sample efficiency10-500x better (EfficientZero)Requires billions of tokensWorld Models
GeneralizationCompositional planningIn-context learningLLMs (more flexible)
Task diversityRequires domain-specific trainingSingle model, many tasksLLMs
Current SOTAGames, robotics controlLanguage, reasoning, codeDomain-dependent
Compute at inferenceHigh (search required)Lower (single forward pass)LLMs
InterpretabilityPartial (can query world model)Low (weights opaque)World Models
AlgorithmSteps to Human-Level (Atari)Relative Efficiency
DQN (2015)50,000,0001x (baseline)
Rainbow (2017)10,000,0005x
SimPLe (2019)100,000500x
EfficientZero (2021)100,000500x (194% human)
DreamerV3 (2023)VariableFixed hyperparameters

Model-based approaches achieve comparable or superior performance with 2 orders of magnitude less data, critical for robotics and real-world applications where data collection is expensive or dangerous.

Research AreaWhy It Applies
Reward modelingExplicit reward models are central
Goal specificationPlanning objectives are visible
CorrigibilityCan potentially modify goals/world model
Interpretability of beliefsCan query world model predictions
ChallengeDescription
Reward hackingPlanning will find unexpected ways to maximize reward
World model exploitationAgent may exploit inaccuracies in world model
Power-seekingPlanning may naturally discover instrumental strategies
Deceptive planningCould agent learn to simulate safe behavior while planning harm?

Arguments For Growth (40-60% probability of increased importance)

Section titled “Arguments For Growth (40-60% probability of increased importance)”
FactorEvidenceStrength
Sample efficiency500x improvement demonstrated (EfficientZero)Strong
Robotics demandPhysical tasks need prediction; 2024 survey shows growing adoptionStrong
LeCun’s advocacyMeta investing heavily in JEPA researchModerate
CompositionalityPlanning naturally combines learned skillsModerate
Scaling evidenceDreamerV3 shows favorable scaling with model sizeModerate

Arguments Against (40-60% probability of continued niche status)

Section titled “Arguments Against (40-60% probability of continued niche status)”
FactorEvidenceStrength
LLMs dominatingGPT-4, Claude perform well on most general tasksStrong
World model accuracyErrors compound over planning horizonStrong
Computational costMCTS is expensive; AlphaZero searches 80k vs Stockfish’s 70M positions/secModerate
Limited generalizationNo world model system matches LLM task diversityStrong
Hybrid approaches emergingLLM + world model combinations may dominate bothModerate

Probability Assessment: Paradigm Dominance at TAI

Section titled “Probability Assessment: Paradigm Dominance at TAI”
ScenarioProbabilityReasoning
World models dominant5-10%Would require breakthrough in scalable world modeling
Hybrid (LLM + world model) dominant25-40%Combines strengths; active research area
LLMs dominant (current trajectory)40-55%Empirically winning; massive investment
Novel paradigm10-20%Unknown unknowns
UncertaintyCurrent EvidenceResolution TimelineImpact on Safety
Can world models scale to real-world complexity?DreamerV3 handles 150+ tasks; robotics limited2-5 yearsHigh: determines applicability
Will hybrid approaches dominate?Active research (LLM + world model); no clear winner3-7 yearsModerate: affects which safety research applies
Is explicit planning necessary for AGI?o1/o3 suggests implicit reasoning works; debate ongoing2-5 yearsHigh: determines alignment approaches
Can world model accuracy be verified?No robust methods exist; critical safety gap5-10 yearsVery High: core alignment concern
Will JEPA fulfill LeCun’s vision?I-JEPA/V-JEPA released but limited impact2-4 yearsModerate: alternative paradigm
Can planning be made safe?Power-seeking, deception emerge naturally from optimization5-15 yearsCritical: core alignment problem
  • Dense Transformers - Alternative LLM approach
  • Neuro-Symbolic - Related hybrid approach
  • Provable Safe - World model verification is central