DARPA robot can learn tasks from Youtube video and a neural net can understand video 20 times faster than a human

A startup called Clarifai is offering a service that uses deep learning to understand video.

The company says its software can rapidly analyze video clips to recognize 10,000 different objects or types of scene. In a demo given last week at a conference on deep learning, Clarifai’s cofounder and CEO Matthew Zeiler uploaded a clip that included footage of a varied alpine landscape. The software created a timeline with graph lines summarizing when different objects or types of scene were detected. It showed exactly when “snow” and “mountains” occurred individually and together. The software can analyze video faster than a human could watch it; in the demonstration, the 3.5 minute clip was processed in just 10 seconds.

Clarifai is offering the technology as a service and expects it to be used for things like matching ads to content in online videos or developing new ways to organize video collections and edit footage.

Robot can learn tasks by watching Video

DARPA funded researchers recently developed a system that enabled robots to process visual data from a series of “how to” cooking videos on YouTube. Based on what was shown on a video, robots were able to recognize, grab and manipulate the correct kitchen utensil or object and perform the demonstrated task with high accuracy—without additional human input or programming.

“The MSEE program initially focused on sensing, which involves perception and understanding of what’s happening in a visual scene, not simply recognizing and identifying objects,” said Reza Ghanadan, program manager in DARPA’s Defense Sciences Offices. “We’ve now taken the next step to execution, where a robot processes visual cues through a manipulation action-grammar module and translates them into actions.”

Another significant advance to come out of the research is the robots’ ability to accumulate and share knowledge with others. Current sensor systems typically view the world anew in each moment, without the ability to apply prior knowledge.

University of Maryland computer scientist Yiannis Aloimonos (center) is developing robotic systems able to visually recognize objects and generate new behavior based on those observations. DARPA is funding this research through its Mathematics of Sensing, Exploitation and Execution (MSEE) program. (University of Maryland Photo

“This system allows robots to continuously build on previous learning—such as types of objects and grasps associated with them—which could have a huge impact on teaching and training,” Ghanadan said. “Instead of the long and expensive process of programming code to teach robots to do tasks, this research opens the potential for robots to learn much faster, at much lower cost and, to the extent they are authorized to do so, share that knowledge with other robots. This learning-based approach is a significant step towards developing technologies that could have benefits in areas such as military repair and logistics.”

SOURCES – Technology Review, University of Maryland, DARPA, Youtube, Clarifai

Leave a Comment