# Publications

**For teaching documents and tutorials see the ****pedagogy**** page. See**** ****my ****Google Scholar**** page for an up-to-date ****list of publications****.**

**RMP-flow: A Computational Graph for Automatic Motion Policy Generation**

*Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birch.field, Dieter Fox, Byron Boots, and Nathan Ratliff*

WAFR 2018

**Abstract:** In this paper, we develop a novel policy synthesis algorithm, RMP-flow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies designed to parameterize non-Euclidean behaviors as dynamical systems in intrinsically nonlinear task spaces. Given a set of RMPs designed for individual tasks, RMP-flow can consistently combine these local policies to generate an expressive global policy, while simultaneously exploiting sparse structure for computational efficiency. We study the geometric properties of RMP-flow and provide sufficient conditions for stability. Finally, we experimentally demonstrate that accounting for the geometry of task policies can simplify classically difficult problems, such as planning through clutter on high-DOF manipulation systems.

**Predictor-Corrector Policy Optimization**

*Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots*

ArXiv preprint 15 Oct 2018

**Abstract:** We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning. The new "PicCoLOed" algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error. Unlike previous algorithms, PicCoLO corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias. The development of PicCoLO is made possible by a novel reduction from predictable online learning to adversarial online learning, which provides a systematic way to modify existing first-order algorithms to achieve the optimal regret with respect to predictable information. We show, in both theory and simulation, that the convergence rate of several first-order model-free algorithms can be improved by PicCoLO.

*Nathan D. Ratliff, Jan Issac, Daniel Kappler, Stan Birchfield, Dieter Fox*

ArXiv preprint 9 Jan 2018 (v1), last revised 25 Jul 2018 (v3))

**Abstract: **We introduce the Riemannian Motion Policy (RMP), a new mathematical object for modular motion generation. An RMP is a second-order dynamical system (acceleration field or motion policy) coupled with a corresponding Riemannian metric. The motion policy maps positions and velocities to accelerations, while the metric captures the directions in the space important to the policy. We show that RMPs provide a straightforward and convenient method for combining multiple motion policies and transforming such policies from one space (such as the task space) to another (such as the configuration space) in geometrically consistent ways. The operators we derive for these combinations and transformations are provably optimal, have linearity properties making them agnostic to the order of application, and are strongly analogous to the covariant transformations of natural gradients popular in the machine learning literature. The RMP framework enables the fusion of motion policies from different motion generation paradigms, such as dynamical systems, dynamic movement primitives (DMPs), optimal control, operational space control, nonlinear reactive controllers, motion optimization, and model predictive control (MPC), thus unifying these disparate techniques from the literature. RMPs are easy to implement and manipulate, facilitate controller design, simplify handling of joint limits, and clarify a number of open questions regarding the proper fusion of motion generation methods (such as incorporating local reactive policies into long-horizon optimizers). We demonstrate the effectiveness of RMPs on both simulation and real robots, including their ability to naturally and efficiently solve complicated collision avoidance problems previously handled by more complex planners.

**Robust Learning of Tactile Force Estimation through Robot Interaction**

*Balakumar Sundaralingam, Alexander (Sasha) Lambert, Ankur Handa, Byron Boots, Tucker Hermans, Stan Birchfield, Nathan Ratliff, Dieter Fox*

ArXiv preprint 15 Oct 2018

**Abstract: **Current methods for estimating force from tactile sensor signals are either inaccurate analytic models or task-specific learned models. In this paper, we explore learning a robust model that maps tactile sensor signals to force. We specifically explore learning a mapping for the SynTouch BioTac sensor via neural networks. We propose a voxelized input feature layer for spatial signals and leverage information about the sensor surface to regularize the loss function. To learn a robust tactile force model that transfers across tasks, we generate ground truth data from three different sources: (1) the BioTac rigidly mounted to a force torque sensor, (2) a robot interacting with a ball rigidly attached to a force sensor to collect a wide range of force readings, and (3) through force inference on a planar pushing task by formalizing the mechanics as a system of particles and optimizing over the object motion. A total of 140k samples were collected from the three sources. To study generalization, we evaluate using the learned model to estimate force inside a feedback controller performing grasp stabilization and object placement. We achieve a median angular accuracy of 0.06 radians in predicting force direction~(66% improvement over the current state of the art) and a median magnitude accuracy of 0.06 N (93% improvement) on a test dataset. Our results can be found on this https URL.

**Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience**

*Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan Ratliff, Dieter Fox*

ArXiv preprint 2018.

**Abstract: **We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at this https URL.

**Real-time Perception meets Reactive Motion Generation**

*Daniel Kappler, Franziska Meier, Jan Issac, Jim Mainprice, Cristina Garcia Cifuentes, Manuel Wüthrich, Vincent Berenz, Stefan Schaal, Nathan Ratliff, Jeannette Bohg*

IEEE RAL 2018

**Abstract:** We address the challenging problem of robotic grasping and manipulation in the presence of uncertainty. This uncertainty is due to noisy sensing, inaccurate models and hard-to-predict environment dynamics. We quantify the importance of continuous, real-time perception and its tight integration with reactive motion generation methods in dynamic manipulation scenarios. We compare three different systems that are instantiations of the most common architectures in the field: (i) a traditional sense-plan-act approach that is still widely used, (ii) a myopic controller that only reacts to local environment dynamics and (iii) a reactive planner that integrates feedback control and motion optimization. All architectures rely on the same components for real-time perception and reactive motion generation to allow a quantitative evaluation. We extensively evaluate the systems on a real robotic platform in four scenarios that exhibit either a challenging workspace geometry or a dynamic environment. In 333 experiments, we quantify the robustness and accuracy that is due to integrating real-time feedback at different time scales in a reactive motion generation system. We also report on the lessons learned for system building.

**A New Data Source for Inverse Dynamics Learning**

*Daniel Kappler, Franziska Meier, Nathan Ratliff, Stefan Schaal*

IROS 2017

**Abstract: **Modern robotics is gravitating toward increasingly collaborative human robot interaction. Tools such as acceleration policies can naturally support the realization of reactive, adaptive, and compliant robots. These tools require us to model the system dynamics accurately -- a difficult task. The fundamental problem remains that simulation and reality diverge--we do not know how to accurately change a robot's state. Thus, recent research on improving inverse dynamics models has been focused on making use of machine learning techniques. Traditional learning techniques train on the actual realized accelerations, instead of the policy's desired accelerations, which is an indirect data source. Here we show how an additional training signal -- measured at the desired accelerations -- can be derived from a feedback control signal. This effectively creates a second data source for learning inverse dynamics models. Furthermore, we show how both the traditional and this new data source, can be used to train task-specific models of the inverse dynamics, when used independently or combined. We analyze the use of both data sources in simulation and demonstrate its effectiveness on a real-world robotic platform. We show that our system incrementally improves the learned inverse dynamics model, and when using both data sources combined converges more consistently and faster.

**Real-Time Natural Language Corrections for Assistive Manipulation Tasks**

*A. Broad, J. Arkin, N. Ratliff, T. Howard and B. Argall.*

The International Journal of Robotics Research (IJRR) 2017.

**Abstract: **We propose a generalizable natural language interface that allows users to provide corrective instructions to an assistive robotic manipulator in real-time. This work is motivated by the desire to improve collaboration between humans and robots in a home environment. Allowing human operators to modify properties of how their robotic counterpart achieves a goal on-the-fly increases the utility of the system by incorporating the strengths of the human partner (e.g. visual acuity and environmental knowledge). This work is applicable to users with and without disability. Our natural language interface is based on the Distributed Correspondence Graph, a probabilistic graphical model that assigns semantic meaning to user utterances in the context of the robot’s environment and current behavior. We then use the desired corrections to alter the behavior of the robotic manipulator by treating the modifications as constraints on the motion generation (planning) paradigm. In this paper, we highlight four dimensions along which a user may wish to correct the behavior of his or her assistive manipulator. We develop our language model using data collected from Amazon Mechanical Turk to capture a comprehensive sample of terminology that people use to describe desired corrections. We then develop an end-to-end system using open-source speech-to-text software and a Kinova Robotics MICO robotic arm. To demonstrate the efficacy of our approach, we run a pilot study with users unfamiliar with robotic systems and analyze points of failure and future directions.

**Warping the Workspace Geometry with Electric Potentials for Motion Optimization of Manipulation Tasks**

*Jim Mainprice, Nathan Ratliff, Stefan Schaal*

*Proc. IROS 2016, October 2016.*

**Abstract: **In this paper we present motion optimization algorithms for computing manipulation motions in presence of obstacles. Our approach builds a geometric representation of the workspace by constructing Riemannian metrics using electric potentials emanating from the workspace obstacles. Velocity of the robot’s body is measured with respect to this metric instead of traditional Cartesian velocity. Empirical results demonstrate that Riemannian metrics are better handled by optimizers that leverage objective and constraint functions’ second order information. This information encodes how the Riemannian geometry of the modeled workspace pulls back into the configuration space. We also show that despite the additional computational burden of computing the electric-potential based metric, it results in faster overall convergence than reasoning on Euclidean geometry alone. Evaluation is made efficient by cashing the electric potential in voxel grids and using Tri-cubic spline interpolation to assess the potentials gradient.

**Towards Robust Online Inverse Dynamics Learning**

*Franziska Meier, Daniel Kappler, Nathan Ratliff, Stefan Schaal*

*Proc. IROS 2016, October 2016.*

**Abstract:** Learning of inverse dynamics modeling errors is key for compliant or force control when analytical models are only rough approximations. Thus, designing real time capable function approximation algorithms has been a necessary focus towards the goal of online model learning. However, because these approaches learn a mapping from actual state and acceleration to torque, good tracking is required to observe data points on the desired path. Recently it has been shown how online gradient descent on a simple modeling error offset term to minimize tracking at acceleration level can address this issue. However, to adapt to larger errors a high learning rate of the online learner is required, resulting in reduced compliancy. Thus, here we propose to combine both approaches: The online adapted offset term ensures good tracking such that a nonlinear function approximator is able to learn an error model on the desired trajectory. This, in turn, reduces the load on the adaptive feedback, enabling it to use a lower learning rate. Combined this creates a controller with variable feedback and low gains, and a feedforward model that can account for larger modeling errors. We demonstrate the effectiveness of this framework, in simulation and on a real system.

**DOOMED: Direct Online Optimization of Modeling Errors in Dynamics**

*Nathan Ratliff, Franziska Meier, Daniel Kappler, Stefan Schaal*

*Tech Report: arXiv:1608.00309*

*(Patent pending technology)*

[**Older version****.**** ***Please reference the ArXiv version*]

**Abstract:** It has long been hoped that model-based control will improve tracking performance while maintaining or increasing compliance. This hope hinges on having or being able to estimate an accurate inverse dynamics model. As a result, substantial effort has gone into modeling and estimating dynamics (error) models. Most recent research has focused on learning the true inverse dynamics using data points mapping observed accelerations to the torques used to generate them. Unfortunately, if the initial tracking error is bad, such learning processes may train substantially off-distribution to predict well on actual observed acceleration rather then the desired accelerations. This work takes a different approach. We define a class of gradient-based online learning algorithms we term Direct Online Optimization for Modeling Errors in Dynamics (DOOMED) that directly minimize an objective measuring the divergence between actual and desired accelerations. Our objective is defined in terms of the true system's unknown dynamics and is therefore impossible to evaluate. However, we show that its gradient is measurable online from system data. We develop a novel adaptive control approach based on running online learning to directly correct (inverse) dynamics errors in real time using the data stream from the robot to accurately achieve desired accelerations during execution.

**On the Fundamental Importance of Gauss-Newton in Motion Optimization**

*Nathan Ratliff, Marc Toussaint, Jeannette Bohg, Stefan Schaal *

*Tech Report: Max Planck Institute for Intelligent Systems. July 2015.*

**Abstract:** Hessian information speeds convergence substantially in motion optimization. The better the Hessian approximation the better the convergence. But how good is a given approximation theoretically? How much are we losing? This paper addresses that question and proves that for a particularly popular and empirically strong approximation known as the Gauss-Newton approximation, we actually lose very little—for a large class of highly expressive objective terms, the true Hessian actually limits to the Gauss-Newton Hessian quickly as the trajectory’s time discretization becomes small. This result both motivates it’s use and offers insight into computationally efficient design. For instance, traditional representations of kinetic energy exploit the generalized inertia matrix whose derivatives are usually difficult to compute. We introduce here a novel reformulation of rigid body kinetic energy designed explicitly for fast and accurate curvature calculation. Our theorem proves that the Gauss-Newton Hessian under this formulation efficiently captures the kinetic energy curvature, but requires only as much computation as a single evaluation of the traditional representation. Additionally, we introduce a technique that exploits these ideas implicitly using Cholesky decompositions for some cases when similar objective terms reformulations exist but may be difficult to find. Our experiments validate these findings and demonstrate their use on a real-world motion optimization system for high-dof motion generation.

**Direct Loss Minimization Inverse Optimal Control**

*Andreas Dörr, Nathan Ratliff, Jeannette Bohg, Marc Toussaint, Stefan Schaal *

*Proc. Robotics: Science and Systems (RSS) 2015, June 2015.*

**Abstract:** Inverse Optimal Control (IOC) has strongly impacted the systems engineering process, enabling automated planner tuning through straightforward and intuitive demonstration. The most successful and established applications, though, have been in lower dimensional problems such as navigation planning where exact optimal planning or control is feasible. In higher dimensional systems, such as humanoid robots, research has made substantial progress toward generalizing the ideas to model free or locally optimal settings, but these systems are complicated to the point where demonstration itself can be difficult. Typically, real-world applications are restricted to at best noisy or even partial or incomplete demonstrations that prove cumbersome in existing frameworks. This work derives a very flexible method of IOC based on a form of Structured Prediction known as Direct Loss Minimization. The resulting algorithm is essentially Policy Search on a reward function that rewards similarity to demonstrated behavior (using Covariance Matrix Adaptation (CMA) in our experiments). Our framework blurs the distinction between IOC, other forms of Imitation Learning, and Reinforcement Learning, enabling us to derive simple, versatile, and practical algorithms that blend imitation and reinforcement signals into a unified framework. Our experiments analyze various aspects of its performance and demonstrate its efficacy on conveying preferences for motion shaping and combined reach and grasp quality optimization.

**Understanding the Geometry of Workspace Obstacles in Motion Optimization**

*Nathan Ratliff, Marc Toussaint, Stefan Schaal * [**longer tech report**][**project page**][**movies**]

*Proc. ICRA 2015, May 2015.*

**Abstract:** What is it that makes movement around obstacles hard? The answer seems clear: obstacles contort the geometry of the workspace and make it difficult to leverage what we consider easy and intuitive straight-line Cartesian geometry. But is Cartesian motion actually easy? It's certainly well-understood and has numerous applications. But beneath the details of linear algebra and pseudoinverses, lies a non-trivial Riemannian metric driving the solution. Cartesian motion is easy only because the pseudoinverse, our powerhouse tool, correctly represents how Euclidean workspace geometry pulls back into the configuration space. In light of that observation, it reasons that motion through a field of obstacles could be just as easy as long as we correctly account for how those obstacles warp the geometry of the space. This paper explores extending our geometric model of the robot beyond the notion of a Cartesian workspace space to fully model and leverage how geometry changes in the presence of obstacles. Intuitively, impenetrable obstacles form topological holes and geodesics curve accordingly around them. We formalize this intuition and develop a general motion optimization framework called Riemannian Motion Optimization (RieMO) to efficiently find motions using our geometric models. We also prove that Gauss-Newton approximations for Hessian calculations across the entire trajectory are natural representations of the problem's first-order Riemannian Geometry and capture all of the relevant curvature information encoded in the Hessian. Our experiments demonstrate that, for many problems, obstacle avoidance can be much more natural when placed within the right geometric context.

**Dual Execution of Optimized Contact Interaction Trajectories**

*Marc Toussaint, Nathan Ratliff, Jeannette Bohg, Ludovic Righetti, Peter Englert, Stefan Schaal*

*Proc. IROS 2014, September 2014.*

**Abstract:** Efficient manipulation requires contact to reduce uncertainty. The manipulation literature refers to this as funneling: a methodology for increasing reliability and robustness by leveraging haptic feedback and control of environmental interaction. However, there is a fundamental gap between traditional approaches to trajectory optimization and this concept of robustness by funneling: traditional trajectory optimizers do not discover force feedback strategies. From a POMDP perspective, these behaviors could be regarded as explicit observation actions planned to sufficiently reduce uncertainty thereby enabling a task. While we are sympathetic to the full POMDP view, solving full continuous-space POMDPs in high-dimensions is hard. In this paper, we propose an alternative approach in which trajectory optimization objectives are augmented with new terms that reward uncertainty reduction through contacts, explicitly promoting funneling. This augmentation shifts the responsibility of robustness toward the actual execution of the optimized trajectories. Directly tracing trajectories through configuration space would lose all robustness---*dual execution* achieves robustness by devising force controllers to reproduce the temporal interaction profile encoded in the *dual* solution of the optimization problem. This work introduces dual execution in depth and analyze its performance through robustness experiments in both simulation and on a real-world robotic platform.

**Learning Geometric Reductions for Planning and Control**

*Nathan Ratliff*

*ICML Workshop on Robot Learning, June 2013.*

**Abstract:** Obstacles in an environment inherently affect the its geometry. A straight line between two points is no longer optimal and may not even be feasible. This paper explores that idea directly and develops algorithms to learn and efficiently leverage the underlying geometry of the environment for planning and control. Specifically, we uncover a problem’s geometry by formulating the problem as one of finding a diffeomorphism (smooth and invertible mapping) from the configuration space to a latent Euclidean space where curved geodesics in the configuration space map to well understood and computationally efficient straight lines in the latent space. Diffeomorphisms in this context act to reduce more complex problems to simpler more fundamental and easily controlled ones. We provided some preliminary experimental results using these techniques and discuss their advantages, limitations, and generalizations.

**CHOMP: Covariant Hamiltonian Optimization for Motion Planning**

*Matthew Zucker**, **Nathan Ratliff**, **Anca Dragan**, Mihail Pivtoraiko, **Matthew Klingensmith**, **Christopher Dellin**, **J. Andrew (Drew) Bagnell**, and **Siddhartha Srinivasa*

*International Journal of Robotics Research, May 2013.*

**Abstract:** In this paper, we present CHOMP (Covariant Hamiltonian Optimization for Motion Planning), a method for trajectory optimization invariant to reparametrization. CHOMP uses functional gradient techniques to iteratively improve the quality of an initial trajectory, optimizing a functional that trades off between a smoothness and an obstacle avoidance component. CHOMP can be used to locally optimize feasible trajectories, as well as to solve motion planning queries, converging to low- cost trajectories even when initialized with infeasible ones. It uses Hamiltonian Monte Carlo to alleviate the problem of convergence to high-cost local minima (and for probabilistic completeness), and is capable of respecting hard constraints along the trajectory. We present extensive experiments with CHOMP on manipulation and locomotion tasks, using 7-DOF manipulators and a rough-terrain quadruped robot.

**History Dependent Domain Adaptation**

*Allen Lavoie, Matt Otey, Nathan Ratliff, and D. Sculley*

*Domain Adaptation Workshop at NIPS '11, December 2011.*

**Abstract:** We study a novel variant of the domain adaptation problem, in which the loss function on test data changes due to dependencies on prior predictions. One important instance of this problem area occurs in settings where it is more costly to make a new error than to repeat a previous error. We propose several methods for learning effectively in this setting, and test them empirically on the real-world tasks of malicious URL classification and adversarial advertisement detection.

**Semi-supervised Learning With Density Based Distances**

*Avleen S. Bijral, Nathan Ratliff, and Nati Srebro*

*27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), July, 2011.*

**Abstract:** We present a simple yet eﬀective approach to Semi-Supervised Learning. Our approach is based on estimating density-based distances (DBD) using a shortest path calculation on a graph. These Graph-DBD estimates can then be used in any distancebased supervised learning method, such as Nearest Neighbor methods and SVMs with RBF kernels. In order to apply the method to very large data sets, we also present a novel algorithm which integrates nearest neighbor computations into the shortest path search and can ﬁnd exact shortest paths even in extremely large dense graphs. Signiﬁcant runtime improvement over the commonly used Laplacian regularization method is then shown on a large scale dataset.

**Manipulation Planning with Goal Sets Using Constrained Trajectory Optimization**

*Anca Dragan, Nathan Ratliff, and Siddhartha Srinivasa*

*2011 IEEE International Conference on Robotics and Automation, May, 2011.*

**Abstract:** Goal sets are omnipresent in manipulation: picking up objects, placing them on counters or in bins, handing them off — all of these tasks encompass continuous sets of goals. This paper describes how to design optimal trajectories that exploit goal sets. We extend CHOMP (Covariant Hamiltonian Optimization for Motion Planning), a recent trajectory optimizer that has proven effective on high-dimensional problems, to handle trajectory-wide constraints, and relate the solution to the intuition of taking unconstrained steps and subsequently projecting them onto the constraints. We then show how this projection simpliﬁes for goal sets (i.e. constraints that affect only the end-point). Finally, we present experiments on a personal robotics platform that show the importance of exploiting goal sets in trajectory optimization for day-to-day manipulation tasks.

**Planning-based Prediction for Pedestrians**

*Brian D. Ziebart, Nathan Ratliff, Garratt Gallagher, Christoph Mertz, Kevin Peterson, J. Andrew (Drew) Bagnell, Martial Hebert, Anind Dey, and Siddhartha Srinivasa*

*Proc. IROS 2009, October, 2009.*

**Abstract:** In this paper, we describe a novel uncertainty-based technique for predicting the future motions of a moving person. Our model assumes that people behave purposefully – efficiently acting to reach intended destinations. We employ the Markov decision process framework and the principle of maximum entropy to obtain a probabilistic, approximately optimal model of human behavior that admits efficient inference and learning algorithms. The method learns a cost function of features of the environment that explains previously observed behavior. This enables generalization to physical changes in the environment, and entirely different environments. Our approach enables robots to plan paths that balance time-to-goal and pedestrian disruption. We quantitatively show the improvement provided by our approach.

**Learning to search: Functional gradient techniques for imitation learning**

*Nathan Ratliff, David Silver, and J. Andrew (Drew) Bagnell*

*Autonomous Robots, Vol. 27, No. 1, July, 2009, pp. 25-53.*

**Abstract:** Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems.

Unfortunately, many “behavioral cloning” (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance.

While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI’s unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed.

Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration. The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function’s form.

We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.

**Self-supervised aerial image analysis for extracting parking lot structure**

*Young-Woo Seo, Nathan Ratliff, and Christopher Urmson*

*Proceedings of International Joint Conference on Artificial Intelligence (IJCAI-09), July, 2009.*

**Abstract: **Road network information simplifies autonomous driving by providing strong priors about environments. It informs a robotic vehicle with where it can drive, models of what can be expected, and contextual cues that influence driving behaviors. Currently, however, road network information is manually generated using a combination of GPS survey and aerial imagery. These manual techniques are labor intensive and error prone. To fully exploit the benefits of digital imagery, these processes should be automated. As a step toward this goal, we present an algorithm that extracts the structure of a parking lot visible from a given aerial image. To minimize human intervention in the use of aerial imagery, we devise a self-supervised learning algorithm that automatically generates a set of parking spot templates to learn the appearance of a parking lot and estimates the structure of the parking lot from the learned model. The data set extracted from a single image alone is too small to sufficiently learn an accurate parking spot model. However, strong priors trained using large data sets collected across multiple images dramatically improve performance. Our self-supervised approach outperforms the prior alone by adapting the distribution of examples toward that found in the current image. A thorough empirical analysis compares leading state-of-the-art learning techniques on this problem.

**CHOMP: Gradient Optimization Techniques for Efficient Motion Planning**

*Nathan Ratliff, Matthew Zucker, J. Andrew (Drew) Bagnell, and Siddhartha Srinivasa*

*IEEE International Conference on Robotics and Automation (ICRA), May, 2009.*

**Abstract: **Existing high-dimensional motion planning algorithms are simultaneously overpowered and underpowered. In domains sparsely populated by obstacles, the heuristics used by sampling-based planners to navigate “narrow passages” can be needlessly complex; furthermore, additional post-processing is required to remove the jerky or extraneous motions from the paths that such planners generate. In this paper, we present CHOMP, a novel method for continuous path refinement that uses covariant gradient techniques to improve the quality of sampled trajectories. Our optimization technique converges over a wider range of input paths and is able to optimize higher-order dynamics of trajectories than previous path optimization strategies. As a result, CHOMP can be used as a standalone motion planner in many real-world planning queries. The effectiveness of our proposed method is demonstrated in manipulation planning for a 6-DOF robotic arm as well as in trajectory generation for a walking quadruped robot.

**Learning to Search: Structured Prediction Techniques for Imitation Learning**

*Nathan Ratliff*

*Doctoral dissertation, Carnegie Mellon University, Robotics Institute 2009*

**Abstract:** Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, time-consuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behavior is clear and easily demonstrated. Inspired by successful end-to-end learning systems such as neural network controlled driving platforms (Pomerleau, 1989), learning-based “programming by demonstration” has gained currency as a method to achieve intelligent robot behavior. Unfortunately, with highly structured algorithms at their core, modern robotic systems are hard to train using classical learning techniques. Rather than redefining robot architectures to accommodate existing learning algorithms, this thesis develops learning techniques that leverage the performance of modern robotic components.

We begin with a discussion of a novel imitation learning framework we call Maximum Margin Planning which automates finding a cost function for optimal planning and control algorithms such as A*. In the linear setting, this framework has firm theoretical backing in the form of strong generalization and regret bounds. Further, we have developed practical nonlinear generalizations that are effective and efficient for real-world problems. This framework reduces imitation learning to a modern form of machine learning known as Maximum Margin Structured Classification (Taskar et al., 2005); these algorithms, therefore, apply both specifically to training existing state-of-the-art planners as well as broadly to solving a range of structured prediction problems of importance in learning and robotics.

In difficult high-dimensional planning domains, such as those found in many manipulation problems, high-performance planning technology remains a topic of much research. We close with some recent work which moves toward simultaneously advancing this technology while retaining the learnability developed above.

Throughout the thesis, we demonstrate our algorithms on a range of applications including overhead navigation, quadrupedal locomotion, heuristic learning, manipulation planning, grasp prediction, driver prediction, pedestrian prediction, optical character recognition, and LADAR classification.

**Inverse Optimal Heuristic Control for Imitation Learning**

*Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew (Drew) Bagnell, Martial Hebert, Anind Dey, and Siddhartha Srinivasa*

*Twelfth International Conference on Artificial Intelligence and Statistics (AIStats), April, 2009.*

**Abstract: **One common approach to imitation learning is behavioral cloning (BC), which employs straight-forward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.

**BiSpace Planning: Concurrent Multi-Space Exploration**

*Rosen Diankov, Nathan Ratliff, David Ferguson, Siddhartha Srinivasa, and James Kuffner*

*Robotics: Science and Systems, June, 2008.*

**Abstract: **We present a planning algorithm called BiSpace that produces fast plans to complex high-dimensional problems by simultaneously exploring multiple spaces. We specifically focus on finding robust solutions to manipulation and grasp planning problems by using BiSpace? special characteristics to explore the work and configuration spaces of the environment and robot. Furthermore, we present a number of techniques for constructing informed heuristics to intelligently search through these high-dimensional spaces. In general, the BiSpace planner is applicable to any problem involving multiple search spaces.

**Imitation Learning for Locomotion and Manipulation**

*Nathan Ratliff, J. Andrew (Drew) Bagnell, and Siddhartha Srinivasa*

*IEEE-RAS International Conference on Humanoid Robots, November, 2007.*

[longer version]

**Abstract: **Decision making in robotics often involves computing an optimal action for a given state, where the space of actions under consideration can potentially be large and state dependent. Many of these decision making problems can be naturally formalized in the multiclass classification framework, where actions are regarded as labels for states. One powerful approach to multiclass classification relies on learning a function that scores each action; action selection is done by returning the action with maximum score. In this work, we focus on two imitation learning problems in particular that arise in robotics. The first problem is footstep prediction for quadruped locomotion, in which the system predicts next footstep locations greedily given the current four-foot configuration of the robot over a terrain height map. The second problem is grasp prediction, in which the system must predict good grasps of complex free-form objects given an approach direction for a robotic hand. We present experimental results of applying a recently developed functional gradient technique for optimizing a structured margin formulation of the corresponding large non-linear multiclass classification problems.

**(Online) Subgradient Methods for Structured Prediction**

*Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich*

*Eleventh International Conference on Artificial Intelligence and Statistics (AIStats), March, 2007.*

**Abstract: **Promising approaches to structured learning problems have recently been developed in the maximum margin framework. Unfortunately, algorithms that are computationally and memory efficient enough to solve large scale problems have lagged behind. We propose using simple subgradient-based techniques for optimizing a regularized risk formulation of these problems in both online and batch settings, and analyze the theoretical convergence, generalization, and robustness properties of the resulting techniques. These algorithms are are simple, memory efficient, fast to converge, and have small regret in the online setting. We also investigate a novel convex regression formulation of structured learning. Finally, we demonstrate the benefits of the subgradient approach on three structured prediction problems.

**Kernel Conjugate Gradient for Fast Kernel Machines**

*Nathan Ratliff and J. Andrew (Drew) Bagnell*

*International Joint Conference on Artificial Intelligence, January, 2007.*

**Abstract: **We propose a novel variant of the conjugate gradient algorithm,Kernel Conjugate Gradient (KCG), designed to speed up learning for kernel machines with differentiable loss functions. This approach leads to a better conditioned optimization problem during learning. We establish an upper bound on the number of iterations for KCG that indicates it should require less than the square root of the number of iterations that standard conjugate gradient requires. In practice, for various differentiable kernel learning problems, we find KCG consistently, and significantly, outperforms existing techniques. The algorithm is simple to implement, requires no more computation per iteration than standard approaches, and is well motivated by Reproducing Kernel Hilbert Space (RKHS) theory. We further show that data-structure techniques recently used to speed up kernel machine approaches are well matched to the algorithm by reducing the dominant costs of training: function evaluation and RKHS inner product computation.

**Boosting Structured Prediction for Imitation Learning**

*Nathan Ratliff, David Bradley, J. Andrew (Drew) Bagnell, and Joel Chestnutt*

*Advances in Neural Information Processing Systems 19, 2007.*

**Abstract: **The Maximum Margin Planning (MMP) algorithm solves imitation learning problems by learning linear mappings from features to cost functions in a planning domain. The learned policy is the result of minimum-cost planning using these cost functions. These mappings are chosen so that example policies (or trajectories) given by a teacher appear to be lower cost (with a loss-scaled margin) than any other policy for a given planning domain. We provide a novel approach, MMPBoost, based on the functional gradient descent view of boosting that extends MMP by ``boosting'' in new features. This approach uses simple binary classification or regression to improve performance of MMP imitation learning, and naturally extends to the class of structured maximum margin prediction problems. Our technique is applied to navigation and planning problems for outdoor mobile robots and robotic legged locomotion.

**Maximum Margin Planning**

*Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich*

*International Conference on Machine Learning, July, 2006.*

**Abstract: **Imitation learning of sequential, goal-directed behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior.

Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A* and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task.

**Gaussian Processes in Graphical Models**

*Nathan Ratliff, Stanislav Funiak, Anton Chechetka*

*Probabilistic Graphical Models class project final report.*

**Abstract:** We propose a theoretical framework for understanding independencies between entire functions in Gaussian Processes based upon properties of the regularization operator for Gaussian Processes with stationary covariance. Furthermore, for the case of functions on a bounded domain, we show how the resulting theory can be made tractable by defining the function to be periodic beyond the domain and utilizing Fourier series expansions.

**Maximum Margin Planning**

*Nathan Ratliff, J. Andrew Bagnell, Martin Zinkevich*

*Neural Information Processing Systems, Vancouver, 2005*

*Workshop on Machine Learning Based Robotics in Unstructured Environments.*

**Abstract:** Mobile robots often rely upon systems that render sensor data and perceptual features into costs that can be used in a planner. The behavior that a designer wishes the planner to execute is often clear, while specifying costs that engender this behavior is a much more difficult task. This is particularly apparent when attempting to simultaneously tune many parameters that define the mapping from features to resulting plans. We provide a novel, structured maximum margin approach to learning based on example trajectories demonstrated by a human. The learning problem is transformed into a convex optimization problem and we provide a simple, efficient algorithm that leverages fast planning methods. Finally, we demonstrate the algorithms performance on learning to map features to plans on two different types of input features.