Inverse Optimal Heuristic Control for Imitation Learning

Toward high-dimensional imitation learning:

Utilizing fast combinatorial planners for efficient stochastic imitation learning.

Nathan Ratliff, Brian Ziebart, Kevin Peterson, J. Andrew Bagnell, Martial Hebert, Siddhartha Srinivasa

From Star Wars Episode I: The Phantom Menace

Inverse optimal heuristic control (IOHC) addresses high-dimensional imitation learning problems by capitalizing on the idea that many of these problems can be modeled as a combination of low-dimensional long-term behaviors and high-dimensional local behaviors (Ratliff et al., 2009b link). Our experiments focus on two problems, pedestrian prediction and taxi cab navigation, that exhibit this form of decomposition. Accurate state representation for both of these problems requires dynamical variables such as momentum; the state dimensionality, therefore, increases beyond the more common two-dimensional position representation used in navigational planning. For both problems, we model the probability of taking an action as inversely proportional to the sum of the cost of the proposed direction change and the cost-to-go of a two-dimensional navigational planner planning from the state that results from taking the proposed action. This work demonstrates some of the first results in using planners to predict pedestrian motion and surpasses the previously published state-of-the-art results in taxi cab prediction as presented in (Ziebart et al., 2008 link). Although the resulting optimization is nonconvex, we additionally introduce a number of convex approximations to the objective function and prove theorems which argue that these nonconvexities are generally insignificant.