Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control

Katie Kang, Paula Gradu, Jason Choi, Michael Janner, Claire Tomlin, Sergey Levine

University of California, Berkeley

ICML, 2022

Arxiv / Short Talk / Long Talk / Blog Post

Summary

When deploying learning-based controllers, we seek a mechanism to constrain the agent to states and actions that resemble those in the training data, in order to avoid poor generalization due to distribution shift. However, in order for an agent to remain in-distribution throughout it's trajectory, the agent must not only avoid visiting states and actions that are out-of-distribution, but also those that will inevitably lead to out-of-distribution states and actions in the future. We present Lyapunov density models (LDMs): a generalization of control Lyapunov functions and density models that provides guarantees on an agent's ability to stay in-distribution over its entire trajectory.

Example of a dynamical system and data distribution for which the conventional "greedy" approach of avoiding low density states and actions fails to keep the agent in-distribution for the entire trajectory, whereas an approach which reasons about the data distribution in a long-horizon, dynamics aware fashion is able to successfully keep the agent in-distribution.

Lyapunov Density Models

In machine learning, density models allow us to estimate the training data distribution, but they do not reason about the dynamics of the system we aim to control. In contrast, in control theory, Lyapunov functions provide a mechanism for making long-horizon guarantees about the stability of a system, but are unrelated to any data or distribution. LDMs combine the data-aware aspect of density models with the dynamics-aware aspect of Lyapunov functions, in order to ensure that a system controlled by a learning-based policy is guaranteed to remain in-distribution over a long horizon.

Example of data distributions (middle) and their associated LDMs (right) for a 2D linear system (left). LDMs can be viewed as "long-horizon, dynamics-aware" transformations on density models.

Using LDMs in Control

We present an algorithm for learning LDMs directly from data, and a method of using an LDM in model-based RL as a constraint on the model optimizer to "shield" against model exploitation. Theoretically, we show that our method maintains guarantees of keeping an agent in-distribution even under approximation errors in the LDM learning process. Empirically, we show that our method outperforms the conventional "greedy" approach of constraining distribution shift on several robotics and medical control problems.

Example evaluation of ours and baseline methods on a hopper control task for different values of constraint thresholds. The user is able to use the constraint threshold to control the tradeoff between protecting against model error vs. flexibility for performing the desired task. On the right, we show example trajectories from when the threshold is too low (hopper falling over due to model exploitation), just right (hopper successfully hopping towards target), or too high (hopper standing still due to over conservatism).