Maximum Margin Planning (MMP)

Nathan Ratliff, J. Andrew Bagnell, Martin Zinkevich, David Silver, David Bradley


My early graduate work capitalized on the idea that directly demonstrating a behavior is often much easier than programming planners to embody that behavior. This research formalized a form ofimitation learning known as learning to plan by reducing it to maximum margin structured classification(Ratliff et al, 2006a link). If a planner can be characterized as minimizing a cost function over paths (e.g. A*, Dijkstra, MDP solvers), then this framework, known as maximum margin planning (MMP), dictates how that cost function can be learned. Within this work, we developed a novel subgradient-based optimization technique for solving maximum margin structured classification problems that converges fast in both theory and practice. We additionally improved generalization bounds for the batch setting and introduced the first online theory for maximum margin structured classification (Ratliff et al, 2007c link).

The above image sequence demonstrates a scenario depicting the type of problem solved by my early work in imitation learning. If a user draws a path between two points on a satellite image that closely follows a well defined road in the image (far left), my system analyzes this demonstration and trains an A* planner to produce plans in new satellite images that consistently follow roads (middle left). Alternatively, if a user draws a path that jumps quickly from the road into thick vegetation before clandestinely making its way to the goal under the cover of protective foliage (middle right), my system trains an A* planner to produce plans in new images that traverse covered vegetative regions en route from start to goal (far right).

Follow-up work developed nonlinear generalizations of these subgradient procedures (see below) based on gradient boosting (Ratliff, et al, 2007a link), including a novel exponentiated functional gradient descent optimization routine (Ratliff et al. 2009c link). These functional gradient optimization techniques demonstrate state-of-the-art performance in a number of application areas including:

overhead navigation, footstep prediction, grasp prediction, heuristic learning, LADAR classification, and optical character recognition.

Collectively, our functional gradient class of optimization routines is known as LEArning to seaRCH(LEARCH).

LEArning to seaRCH (LEARCH):

One of the most frequently voiced complaints against machine learning by roboticists is feature selection. Much of the theoretical machine learning work pervading the literature develops results for the linear setting. In this setting, learners operate within a predefined feature space and theoretical statements are presented relative to the best hypothesis in the space (particularly for PAC and online models). However, finding a feature space that provides a rich enough hypothesis space to result in good overall performance is often tricky.

For this reason, we developed a class of functional gradient optimization procedures known as LEArning to seaRCH (LEARCH) for solving the maximum margin planning problem. These optimization routines utilize both function spaces and hypothesis exponentiation to significantly improve the dynamic range of the learned cost functions. This class of algorithms develops around a novel optimization tool that we call functional exponentiated gradient descent. The above figure demonstrates that LEARCH algorithms can successfully learn real-world concepts using simply a collection of smoothed image as its baseline feature set. The left-most image shows an example path; the middle image shows the best performance of a linear model using this base-line feature set; and the final image shows the successful hypothesis learned using LEARCH optimization.