Multi-Fingered Active Grasp Learning

This is a review of a 2020 academic paper about using learning systems to train robotics arms and hands to grasp objects.

Learning-based approaches to grasp planning are preferred over analytical methods due to their ability to better generalize to new, partially observed objects. However, data collection remains one of the biggest bottlenecks for grasp learning methods, particularly for multi-fingered hands. The relatively high dimensional configuration space of the hands coupled with the diversity of objects common in daily life requires a significant number of samples to produce robust and confident grasp success classifiers. In this paper, researchers present the first active deep learning approach to grasping that searches over the grasp configuration space and classifier confidence in a unified manner. Researchers base their approach on recent success in planning multi-fingered grasps as probabilistic inference with a learned neural network likelihood function. They embed this within a multi-armed bandit formulation of sample selection. They show that their active grasp learning approach uses fewer training samples to produce grasp success rates comparable with the passive supervised learning method trained with grasping data generated by an analytical planner. In 2020, researchers additionally show that grasps generated by the active learner have greater qualitative and quantitative diversity in shape.

Arxiv – Multi-Fingered Active Grasp Learning

Learning-based grasp planning has become popular over the past decade, because of its ability to generalize well to novel objects with only partial-view object information. These approaches require large amounts of data for training, particularly those that utilize deep neural networks. However, large scale data collection remains a challenge for multi-fingered grasping, because (1)
objects common in daily life exhibit large variation in terms of geometry, texture, inertial properties, and appearance; and
(2) the relatively high dimension of multi-fingered grasp configurations, (e.g. 22 dimensions for the configuration of
hand and wrist pose in this paper).

Newer active learning approaches interactively learn a grasp model that better covers the grasp configuration space across different objects using fewer samples compared with a passive, supervised grasp learner. Instead of passively inducing a hypothesis to explain the available training data as in standard supervised learning, active learning develops and tests new hypotheses continuously and interactively.

Active learning is most appropriate when 1) unlabeled data samples are numerous, 2) a lot of labeled data are needed to train an accurate supervised learning system, and 3) data samples can be easily collected or synthesized. Grasp learning satisfies each of these conditions: 1) there are infinitely many possible grasps, 2) a large number of labeled training samples are necessary to cover the space, and 3) the robot is its own oracle—it can try a grasps and automatically detecting success or failure without human labeling.

Tesla already has autolabelling of objects in the physical world.