Contextual Latent World Models for Offline Meta Reinforcement Learning

3 Mar, 2026·

Mohammadreza Nakhaei

Aidan Scannell

Kevin Luck

Joni Pajarinen

· 0 min read

PDF Preprint

Abstract

Offline meta-reinforcement learning seeks to learn policies that generalize across related tasks from fixed datasets. Context-based methods infer a task representation from transition histories, but learning effective task representations without supervision remains a challenge. In parallel, latent world models have demonstrated strong self-supervised representation learning through temporal consistency. We introduce contextual latent world models, which condition latent world models on inferred task representations and train them jointly with the context encoder. This enforces task-conditioned temporal consistency, yielding task representations that capture task-dependent dynamics rather than merely discriminating between tasks. Our method learns more expressive task representations and significantly improves generalization to unseen tasks across MuJoCo, Contextual-DeepMind Control, and Meta-World benchmarks.

Type

Publication

arXiv preprint arXiv:2603.02935

Last updated on 5 Mar, 2026

World-Models Reinforcement-Learning Offline-Rl Meta-Rl Representation-Learning

Authors

Aidan Scannell (he/him)

Research Associate

Kalman Linear Attention: Parallel Bayesian Filtering For Efficient Language Modelling and State Tracking 11 Feb, 2026 →