Avleen S. Bijral, Nathan Ratliff, and Nati Srebro
Matthew Zucker, Nathan Ratliff, Anca Dragan, Mihail Pivtoraiko, Matthew Klingensmith, Christopher Dellin, J. Andrew (Drew) Bagnell, and Siddhartha Srinivasa
Real-Time Natural Language Corrections for Assistive Manipulation TasksA. Broad, J. Arkin, N. Ratliff, T. Howard and B. Argall.
Accepted for publication in the International Journal of Robotics Research (IJRR) 2017.
Abstract: We propose a generalizable natural language interface that allows users to provide corrective instructions to an assistive robotic manipulator in real-time. This work is motivated by the desire to improve collaboration between humans and robots in a home environment. Allowing human operators to modify properties of how their robotic counterpart achieves a goal on-the-fly increases the utility of the system by incorporating the strengths of the human partner (e.g. visual acuity and environmental knowledge). This work is applicable to users with and without disability. Our natural language interface is based on the Distributed Correspondence Graph, a probabilistic graphical model that assigns semantic meaning to user utterances in the context of the robot’s environment and current behavior. We then use the desired corrections to alter the behavior of the robotic manipulator by treating the modifications as constraints on the motion generation (planning) paradigm. In this paper, we highlight four dimensions along which a user may wish to correct the behavior of his or her assistive manipulator. We develop our language model using data collected from Amazon Mechanical Turk to capture a comprehensive sample of terminology that people use to describe desired corrections. We then develop an end-to-end system using open-source speech-to-text software and a Kinova Robotics MICO robotic arm. To demonstrate the efficacy of our approach, we run a pilot study with users unfamiliar with robotic systems and analyze points of failure and future directions.
Proc. IROS 2016, October 2016.
Abstract: In this paper we present motion optimization algorithms for computing manipulation motions in presence of obstacles. Our approach builds a geometric representation of the workspace by constructing Riemannian metrics using electric potentials emanating from the workspace obstacles. Velocity of the robot’s body is measured with respect to this metric instead of traditional Cartesian velocity. Empirical results demonstrate that Riemannian metrics are better handled by optimizers that leverage objective and constraint functions’ second order information. This information encodes how the Riemannian geometry of the modeled workspace pulls back into the configuration space. We also show that despite the additional computational burden of computing the electric-potential based metric, it results in faster overall convergence than reasoning on Euclidean geometry alone. Evaluation is made efficient by cashing the electric potential in voxel grids and using Tri-cubic spline interpolation to assess the potentials gradient.
Proc. IROS 2016, October 2016.
Abstract: Learning of inverse dynamics modeling errors is key for compliant or force control when analytical models are only rough approximations. Thus, designing real time capable function approximation algorithms has been a necessary focus towards the goal of online model learning. However, because these approaches learn a mapping from actual state and acceleration to torque, good tracking is required to observe data points on the desired path. Recently it has been shown how online gradient descent on a simple modeling error offset term to minimize tracking at acceleration level can address this issue. However, to adapt to larger errors a high learning rate of the online learner is required, resulting in reduced compliancy. Thus, here we propose to combine both approaches: The online adapted offset term ensures good tracking such that a nonlinear function approximator is able to learn an error model on the desired trajectory. This, in turn, reduces the load on the adaptive feedback, enabling it to use a lower learning rate. Combined this creates a controller with variable feedback and low gains, and a feedforward model that can account for larger modeling errors. We demonstrate the effectiveness of this framework, in simulation and on a real system.
Tech Report: arXiv:1608.00309
(Patent pending technology)
[Older version. Please reference the ArXiv version]
Abstract: It has long been hoped that model-based control will improve tracking performance while maintaining or increasing compliance. This hope hinges on having or being able to estimate an accurate inverse dynamics model. As a result, substantial effort has gone into modeling and estimating dynamics (error) models. Most recent research has focused on learning the true inverse dynamics using data points mapping observed accelerations to the torques used to generate them. Unfortunately, if the initial tracking error is bad, such learning processes may train substantially off-distribution to predict well on actual observed acceleration rather then the desired accelerations. This work takes a different approach. We define a class of gradient-based online learning algorithms we term Direct Online Optimization for Modeling Errors in Dynamics (DOOMED) that directly minimize an objective measuring the divergence between actual and desired accelerations. Our objective is defined in terms of the true system's unknown dynamics and is therefore impossible to evaluate. However, we show that its gradient is measurable online from system data. We develop a novel adaptive control approach based on running online learning to directly correct (inverse) dynamics errors in real time using the data stream from the robot to accurately achieve desired accelerations during execution.
Tech Report: Max Planck Institute for Intelligent Systems. July 2015.
Abstract: Hessian information speeds convergence substantially in motion optimization. The better the Hessian approximation the better the convergence. But how good is a given approximation theoretically? How much are we losing? This paper addresses that question and proves that for a particularly popular and empirically strong approximation known as the Gauss-Newton approximation, we actually lose very little—for a large class of highly expressive objective terms, the true Hessian actually limits to the Gauss-Newton Hessian quickly as the trajectory’s time discretization becomes small. This result both motivates it’s use and offers insight into computationally efficient design. For instance, traditional representations of kinetic energy exploit the generalized inertia matrix whose derivatives are usually difficult to compute. We introduce here a novel reformulation of rigid body kinetic energy designed explicitly for fast and accurate curvature calculation. Our theorem proves that the Gauss-Newton Hessian under this formulation efficiently captures the kinetic energy curvature, but requires only as much computation as a single evaluation of the traditional representation. Additionally, we introduce a technique that exploits these ideas implicitly using Cholesky decompositions for some cases when similar objective terms reformulations exist but may be difficult to find. Our experiments validate these findings and demonstrate their use on a real-world motion optimization system for high-dof motion generation.
Proc. Robotics: Science and Systems (RSS) 2015, June 2015.
Abstract: Inverse Optimal Control (IOC) has strongly impacted the systems engineering process, enabling automated planner tuning through straightforward and intuitive demonstration. The most successful and established applications, though, have been in lower dimensional problems such as navigation planning where exact optimal planning or control is feasible. In higher dimensional systems, such as humanoid robots, research has made substantial progress toward generalizing the ideas to model free or locally optimal settings, but these systems are complicated to the point where demonstration itself can be difficult. Typically, real-world applications are restricted to at best noisy or even partial or incomplete demonstrations that prove cumbersome in existing frameworks. This work derives a very flexible method of IOC based on a form of Structured Prediction known as Direct Loss Minimization. The resulting algorithm is essentially Policy Search on a reward function that rewards similarity to demonstrated behavior (using Covariance Matrix Adaptation (CMA) in our experiments). Our framework blurs the distinction between IOC, other forms of Imitation Learning, and Reinforcement Learning, enabling us to derive simple, versatile, and practical algorithms that blend imitation and reinforcement signals into a unified framework. Our experiments analyze various aspects of its performance and demonstrate its efficacy on conveying preferences for motion shaping and combined reach and grasp quality optimization.
[longer tech report][project page][movies]
Proc. ICRA 2015, May 2015.
Abstract: What is it that makes movement around obstacles hard? The answer seems clear: obstacles contort the geometry of the workspace and make it difficult to leverage what we consider easy and intuitive straight-line Cartesian geometry. But is Cartesian motion actually easy? It's certainly well-understood and has numerous applications. But beneath the details of linear algebra and pseudoinverses, lies a non-trivial Riemannian metric driving the solution. Cartesian motion is easy only because the pseudoinverse, our powerhouse tool, correctly represents how Euclidean workspace geometry pulls back into the configuration space. In light of that observation, it reasons that motion through a field of obstacles could be just as easy as long as we correctly account for how those obstacles warp the geometry of the space. This paper explores extending our geometric model of the robot beyond the notion of a Cartesian workspace space to fully model and leverage how geometry changes in the presence of obstacles. Intuitively, impenetrable obstacles form topological holes and geodesics curve accordingly around them. We formalize this intuition and develop a general motion optimization framework called Riemannian Motion Optimization (RieMO) to efficiently find motions using our geometric models. We also prove that Gauss-Newton approximations for Hessian calculations across the entire trajectory are natural representations of the problem's first-order Riemannian Geometry and capture all of the relevant curvature information encoded in the Hessian. Our experiments demonstrate that, for many problems, obstacle avoidance can be much more natural when placed within the right geometric context.
Proc. IROS 2014, September 2014.
Abstract: Efficient manipulation requires contact to reduce uncertainty. The manipulation literature refers to this as funneling: a methodology for increasing reliability and robustness by leveraging haptic feedback and control of environmental interaction. However, there is a fundamental gap between traditional approaches to trajectory optimization and this concept of robustness by funneling: traditional trajectory optimizers do not discover force feedback strategies. From a POMDP perspective, these behaviors could be regarded as explicit observation actions planned to sufficiently reduce uncertainty thereby enabling a task. While we are sympathetic to the full POMDP view, solving full continuous-space POMDPs in high-dimensions is hard. In this paper, we propose an alternative approach in which trajectory optimization objectives are augmented with new terms that reward uncertainty reduction through contacts, explicitly promoting funneling. This augmentation shifts the responsibility of robustness toward the actual execution of the optimized trajectories. Directly tracing trajectories through configuration space would lose all robustness---dual execution achieves robustness by devising force controllers to reproduce the temporal interaction profile encoded in the dual solution of the optimization problem. This work introduces dual execution in depth and analyze its performance through robustness experiments in both simulation and on a real-world robotic platform.
ICML Workshop on Robot Learning, June 2013.
Abstract: Obstacles in an environment inherently affect the its geometry. A straight line between two points is no longer optimal and may not even be feasible. This paper explores that idea directly and develops algorithms to learn and efficiently leverage the underlying geometry of the environment for planning and control. Specifically, we uncover a problem’s geometry by formulating the problem as one of finding a diffeomorphism (smooth and invertible mapping) from the configuration space to a latent Euclidean space where curved geodesics in the configuration space map to well understood and computationally efficient straight lines in the latent space. Diffeomorphisms in this context act to reduce more complex problems to simpler more fundamental and easily controlled ones. We provided some preliminary experimental results using these techniques and discuss their advantages, limitations, and generalizations.
International Journal of Robotics Research, May 2013.
Abstract: In this paper, we present CHOMP (Covariant Hamiltonian Optimization for Motion Planning), a method for trajectory optimization invariant to reparametrization. CHOMP uses functional gradient techniques to iteratively improve the quality of an initial trajectory, optimizing a functional that trades off between a smoothness and an obstacle avoidance component. CHOMP can be used to locally optimize feasible trajectories, as well as to solve motion planning queries, converging to low- cost trajectories even when initialized with infeasible ones. It uses Hamiltonian Monte Carlo to alleviate the problem of convergence to high-cost local minima (and for probabilistic completeness), and is capable of respecting hard constraints along the trajectory. We present extensive experiments with CHOMP on manipulation and locomotion tasks, using 7-DOF manipulators and a rough-terrain quadruped robot.
Domain Adaptation Workshop at NIPS '11, December 2011.
Abstract: We study a novel variant of the domain adaptation problem, in which the loss function on test data changes due to dependencies on prior predictions. One important instance of this problem area occurs in settings where it is more costly to make a new error than to repeat a previous error. We propose several methods for learning effectively in this setting, and test them empirically on the real-world tasks of malicious URL classification and adversarial advertisement detection.
27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), July, 2011.
Abstract: We present a simple yet eﬀective approach to Semi-Supervised Learning. Our approach is based on estimating density-based distances (DBD) using a shortest path calculation on a graph. These Graph-DBD estimates can then be used in any distancebased supervised learning method, such as Nearest Neighbor methods and SVMs with RBF kernels. In order to apply the method to very large data sets, we also present a novel algorithm which integrates nearest neighbor computations into the shortest path search and can ﬁnd exact shortest paths even in extremely large dense graphs. Signiﬁcant runtime improvement over the commonly used Laplacian regularization method is then shown on a large scale dataset.
Anca Dragan, Nathan Ratliff, and Siddhartha Srinivasa
2011 IEEE International Conference on Robotics and Automation, May, 2011.
Abstract: Goal sets are omnipresent in manipulation: picking up objects, placing them on counters or in bins, handing them off — all of these tasks encompass continuous sets of goals. This paper describes how to design optimal trajectories that exploit goal sets. We extend CHOMP (Covariant Hamiltonian Optimization for Motion Planning), a recent trajectory optimizer that has proven effective on high-dimensional problems, to handle trajectory-wide constraints, and relate the solution to the intuition of taking unconstrained steps and subsequently projecting them onto the constraints. We then show how this projection simpliﬁes for goal sets (i.e. constraints that affect only the end-point). Finally, we present experiments on a personal robotics platform that show the importance of exploiting goal sets in trajectory optimization for day-to-day manipulation tasks.
Brian D. Ziebart, Nathan Ratliff, Garratt Gallagher, Christoph Mertz, Kevin Peterson, J. Andrew (Drew) Bagnell, Martial Hebert, Anind Dey, and Siddhartha Srinivasa
Proc. IROS 2009, October, 2009.
Abstract: In this paper, we describe a novel uncertainty-based technique for predicting the future motions of a moving person. Our model assumes that people behave purposefully – efficiently acting to reach intended destinations. We employ the Markov decision process framework and the principle of maximum entropy to obtain a probabilistic, approximately optimal model of human behavior that admits efficient inference and learning algorithms. The method learns a cost function of features of the environment that explains previously observed behavior. This enables generalization to physical changes in the environment, and entirely different environments. Our approach enables robots to plan paths that balance time-to-goal and pedestrian disruption. We quantitatively show the improvement provided by our approach.
Nathan Ratliff, David Silver, and J. Andrew (Drew) Bagnell
Autonomous Robots, Vol. 27, No. 1, July, 2009, pp. 25-53.
Abstract: Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems.
Unfortunately, many “behavioral cloning” (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance.
While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI’s unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed.
Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration. The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function’s form.
We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.
Young-Woo Seo, Nathan Ratliff, and Christopher Urmson
Proceedings of International Joint Conference on Artificial Intelligence (IJCAI-09), July, 2009.
Abstract: Road network information simplifies autonomous driving by providing strong priors about environments. It informs a robotic vehicle with where it can drive, models of what can be expected, and contextual cues that influence driving behaviors. Currently, however, road network information is manually generated using a combination of GPS survey and aerial imagery. These manual techniques are labor intensive and error prone. To fully exploit the benefits of digital imagery, these processes should be automated. As a step toward this goal, we present an algorithm that extracts the structure of a parking lot visible from a given aerial image. To minimize human intervention in the use of aerial imagery, we devise a self-supervised learning algorithm that automatically generates a set of parking spot templates to learn the appearance of a parking lot and estimates the structure of the parking lot from the learned model. The data set extracted from a single image alone is too small to sufficiently learn an accurate parking spot model. However, strong priors trained using large data sets collected across multiple images dramatically improve performance. Our self-supervised approach outperforms the prior alone by adapting the distribution of examples toward that found in the current image. A thorough empirical analysis compares leading state-of-the-art learning techniques on this problem.
Nathan Ratliff, Matthew Zucker, J. Andrew (Drew) Bagnell, and Siddhartha Srinivasa
IEEE International Conference on Robotics and Automation (ICRA), May, 2009.
Abstract: Existing high-dimensional motion planning algorithms are simultaneously overpowered and underpowered. In domains sparsely populated by obstacles, the heuristics used by sampling-based planners to navigate “narrow passages” can be needlessly complex; furthermore, additional post-processing is required to remove the jerky or extraneous motions from the paths that such planners generate. In this paper, we present CHOMP, a novel method for continuous path refinement that uses covariant gradient techniques to improve the quality of sampled trajectories. Our optimization technique converges over a wider range of input paths and is able to optimize higher-order dynamics of trajectories than previous path optimization strategies. As a result, CHOMP can be used as a standalone motion planner in many real-world planning queries. The effectiveness of our proposed method is demonstrated in manipulation planning for a 6-DOF robotic arm as well as in trajectory generation for a walking quadruped robot.
Doctoral dissertation, Carnegie Mellon University, Robotics Institute 2009
Abstract: Modern robots successfully manipulate objects, navigate rugged terrain, drive in urban settings, and play world-class chess. Unfortunately, programming these robots is challenging, time-consuming and expensive; the parameters governing their behavior are often unintuitive, even when the desired behavior is clear and easily demonstrated. Inspired by successful end-to-end learning systems such as neural network controlled driving platforms (Pomerleau, 1989), learning-based “programming by demonstration” has gained currency as a method to achieve intelligent robot behavior. Unfortunately, with highly structured algorithms at their core, modern robotic systems are hard to train using classical learning techniques. Rather than redefining robot architectures to accommodate existing learning algorithms, this thesis develops learning techniques that leverage the performance of modern robotic components.
We begin with a discussion of a novel imitation learning framework we call Maximum Margin Planning which automates finding a cost function for optimal planning and control algorithms such as A*. In the linear setting, this framework has firm theoretical backing in the form of strong generalization and regret bounds. Further, we have developed practical nonlinear generalizations that are effective and efficient for real-world problems. This framework reduces imitation learning to a modern form of machine learning known as Maximum Margin Structured Classification (Taskar et al., 2005); these algorithms, therefore, apply both specifically to training existing state-of-the-art planners as well as broadly to solving a range of structured prediction problems of importance in learning and robotics.
In difficult high-dimensional planning domains, such as those found in many manipulation problems, high-performance planning technology remains a topic of much research. We close with some recent work which moves toward simultaneously advancing this technology while retaining the learnability developed above.
Throughout the thesis, we demonstrate our algorithms on a range of applications including overhead navigation, quadrupedal locomotion, heuristic learning, manipulation planning, grasp prediction, driver prediction, pedestrian prediction, optical character recognition, and LADAR classification.
Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew (Drew) Bagnell, Martial Hebert, Anind Dey, and Siddhartha Srinivasa
Twelfth International Conference on Artificial Intelligence and Statistics (AIStats), April, 2009.
Abstract: One common approach to imitation learning is behavioral cloning (BC), which employs straight-forward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.
Rosen Diankov, Nathan Ratliff, David Ferguson, Siddhartha Srinivasa, and James Kuffner
Robotics: Science and Systems, June, 2008.
Abstract: We present a planning algorithm called BiSpace that produces fast plans to complex high-dimensional problems by simultaneously exploring multiple spaces. We specifically focus on finding robust solutions to manipulation and grasp planning problems by using BiSpace? special characteristics to explore the work and configuration spaces of the environment and robot. Furthermore, we present a number of techniques for constructing informed heuristics to intelligently search through these high-dimensional spaces. In general, the BiSpace planner is applicable to any problem involving multiple search spaces.
Nathan Ratliff, J. Andrew (Drew) Bagnell, and Siddhartha Srinivasa
IEEE-RAS International Conference on Humanoid Robots, November, 2007.
Abstract: Decision making in robotics often involves computing an optimal action for a given state, where the space of actions under consideration can potentially be large and state dependent. Many of these decision making problems can be naturally formalized in the multiclass classification framework, where actions are regarded as labels for states. One powerful approach to multiclass classification relies on learning a function that scores each action; action selection is done by returning the action with maximum score. In this work, we focus on two imitation learning problems in particular that arise in robotics. The first problem is footstep prediction for quadruped locomotion, in which the system predicts next footstep locations greedily given the current four-foot configuration of the robot over a terrain height map. The second problem is grasp prediction, in which the system must predict good grasps of complex free-form objects given an approach direction for a robotic hand. We present experimental results of applying a recently developed functional gradient technique for optimizing a structured margin formulation of the corresponding large non-linear multiclass classification problems.
Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich
Eleventh International Conference on Artificial Intelligence and Statistics (AIStats), March, 2007.
Abstract: Promising approaches to structured learning problems have recently been developed in the maximum margin framework. Unfortunately, algorithms that are computationally and memory efficient enough to solve large scale problems have lagged behind. We propose using simple subgradient-based techniques for optimizing a regularized risk formulation of these problems in both online and batch settings, and analyze the theoretical convergence, generalization, and robustness properties of the resulting techniques. These algorithms are are simple, memory efficient, fast to converge, and have small regret in the online setting. We also investigate a novel convex regression formulation of structured learning. Finally, we demonstrate the benefits of the subgradient approach on three structured prediction problems.
Nathan Ratliff and J. Andrew (Drew) Bagnell
International Joint Conference on Artificial Intelligence, January, 2007.
Abstract: We propose a novel variant of the conjugate gradient algorithm,Kernel Conjugate Gradient (KCG), designed to speed up learning for kernel machines with differentiable loss functions. This approach leads to a better conditioned optimization problem during learning. We establish an upper bound on the number of iterations for KCG that indicates it should require less than the square root of the number of iterations that standard conjugate gradient requires. In practice, for various differentiable kernel learning problems, we find KCG consistently, and significantly, outperforms existing techniques. The algorithm is simple to implement, requires no more computation per iteration than standard approaches, and is well motivated by Reproducing Kernel Hilbert Space (RKHS) theory. We further show that data-structure techniques recently used to speed up kernel machine approaches are well matched to the algorithm by reducing the dominant costs of training: function evaluation and RKHS inner product computation.
Nathan Ratliff, David Bradley, J. Andrew (Drew) Bagnell, and Joel Chestnutt
Advances in Neural Information Processing Systems 19, 2007.
Abstract: The Maximum Margin Planning (MMP) algorithm solves imitation learning problems by learning linear mappings from features to cost functions in a planning domain. The learned policy is the result of minimum-cost planning using these cost functions. These mappings are chosen so that example policies (or trajectories) given by a teacher appear to be lower cost (with a loss-scaled margin) than any other policy for a given planning domain. We provide a novel approach, MMPBoost, based on the functional gradient descent view of boosting that extends MMP by ``boosting'' in new features. This approach uses simple binary classification or regression to improve performance of MMP imitation learning, and naturally extends to the class of structured maximum margin prediction problems. Our technique is applied to navigation and planning problems for outdoor mobile robots and robotic legged locomotion.
Nathan Ratliff, J. Andrew (Drew) Bagnell, and Martin Zinkevich
International Conference on Machine Learning, July, 2006.
Abstract: Imitation learning of sequential, goal-directed behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior.
Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A* and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task.
Gaussian Processes in Graphical Models
Nathan Ratliff, Stanislav Funiak, Anton Chechetka
Probabilistic Graphical Models class project final report.
Abstract: We propose a theoretical framework for understanding independencies between entire functions in Gaussian Processes based upon properties of the regularization operator for Gaussian Processes with stationary covariance. Furthermore, for the case of functions on a bounded domain, we show how the resulting theory can be made tractable by defining the function to be periodic beyond the domain and utilizing Fourier series expansions.
Maximum Margin Planning
Nathan Ratliff, J. Andrew Bagnell, Martin Zinkevich
Neural Information Processing Systems, Vancouver, 2005
Workshop on Machine Learning Based Robotics in Unstructured Environments.
Abstract: Mobile robots often rely upon systems that render sensor data and perceptual features into costs that can be used in a planner. The behavior that a designer wishes the planner to execute is often clear, while specifying costs that engender this behavior is a much more difficult task. This is particularly apparent when attempting to simultaneously tune many parameters that define the mapping from features to resulting plans. We provide a novel, structured maximum margin approach to learning based on example trajectories demonstrated by a human. The learning problem is transformed into a convex optimization problem and we provide a simple, efficient algorithm that leverages fast planning methods. Finally, we demonstrate the algorithms performance on learning to map features to plans on two different types of input features.