Learning from Human Teleoperation
Predicting which foothold a human operator would choose next.
Nathan Ratliff, Joel Chestnutt, J. Andrew Bagnell
Early experiments demonstrated that our footstep cost functions assumed erroneously that the terrain would have a high friction coefficient. Unfortunately, when this assumption did not hold, the robot displayed a substantial degradation in performance due to slippage during execution. The relative ease of robot teleoperation allowed us to demonstrate robust solutions with footholds qualitatively different from those found by the automated footstep planner. We utilized LEARCH optimization under the MMP framework to generalize this demonstrated behavior in two ways. Initially, we modeled the cost of a step using not only features of the terrain, but also features of the action. This combination of features enabled us to both interpret the surrounding terrain and to encode constraints on the kinematics of the robot. Following this approach, we trained a next foothold prediction policy which was stable enough to greedily generate a sequence of footsteps across rugged terrain given a fixed foot order sequence (Ratliff et al., 2007d). Additionally, we demonstrated that although the learned predictor was trained using examples of rugged terrain, it learned a robust action model that could successfully traverse flat terrain at an even cadence. The videos below demonstrate the learning process and the performance of the learned footstep prediction policy across rugged terrain.
Unfortunately, the aggressive stride of the demonstrated footsteps was too long for our controller to execute autonomously; this incompatibility prevented us from running the learned policy on the physical robot. However, by removing the kinematic features and re-learning a cost function solely as a function of terrain features, we successfully learned a replacement for the footstep planner’s terrain cost map. Since the planner’s action model was a priori feasible for our controller, we could easily execute on the physical robot the planned sequence of footsteps that generalized the demonstrated behavior. The video at the top of this page contrasts the performance of the robot after learning to the performance before learning on a particularly problematic terrain. This video clearly demonstrates that the sequence of footsteps planned under the learned cost function were significantly more robust than the sequence planned under the original hand-tuned cost function. In particular, the learned cost function encouraged the robot avoid peaks in favor of terrain convexities. Any slipping, therefore, is more likely to fall back into a stable configuration at the bottom of the basin.
The above images depict a slightly revised set of results that parallel figure 3 of (Ratliff et al., 2007d link). In the original work, we used an unexponentiated variant of LEARCH with a simple gradient descent-based nonlinear neural network base learner. The reduced dynamic range of the unexponentiated hypothesis space in the original experiments dictated the use of a more complicated loss function that tapered to a constant value as it receded from the desired footstep location. In these revised experiments, we used the exponentiated variant. The resulting increase in dynamic range allowed us to use a straightforward quadratic loss function in conjunction with a more sophisticated neural network training technique based on Levenberg-Marquardt optimization. The first row of these images shows a sequence of footsteps predicted across rugged terrain, and the second row shows these same footsteps overlying the corresponding cost regions. The final row demonstrations the performance of our prediction policy in the absence of terrain information by predicting a sequence of footsteps across flat ground. The improved cost dynamic range of the cost function is clear in these images.