The robot saw Shikhar Bahl open the refrigerator door. He recorded his movements, the swing of the door, the location of the refrigerator and more, analyzed this data and prepared to imitate what Bahl had done.
It failed at first, sometimes missing the handle entirely, grabbing it in the wrong place, or pulling it wrong. But after a few hours of practice, the robot succeeded and opened the door.
“Imitation is a great way to learn,” said Bahl, who has a Ph.D. student at the Robotics Institute (RI) in the School of Computer Science at Carnegie Mellon University. “Making robots learn by directly observing humans remains an unsolved problem in the field, but this work takes an important step toward enabling this capability.”
Bahl worked with Deepak Pathak and Abhinav Gupta, both RI faculty members, to develop a new learning method for robots called WHIRL, short for In-the-Wild Human Imitating Robot Learning. WHIRL is an efficient algorithm for unique visual imitation. It can learn directly from videos of human interaction and generalize this information to new tasks, making robots suitable for learning household chores. People are constantly performing various tasks at home. With WHIRL, a robot can observe these tasks and gather the video data it needs to eventually figure out how to do the job itself.
The team added a camera and its software to a ready-to-use robot, and it learned to perform more than 20 tasks – from opening and closing appliances, cabinet doors and drawers to putting a lid on a pot, pushing a chair and even pull a trash bag out of the trash can. Each time, the robot watched a human perform the task once, after which it practiced and learned to perform the task on its own. The team presented their research this month at the Robotics: Science and Systems conference in New York.
“This work presents a way to bring robots into the home,” said Pathak, an assistant professor at RI and a member of the team. “Instead of waiting for robots to be programmed or trained to successfully perform various tasks before being installed in people’s homes, this technology allows us to deploy the robots and teach them how to perform tasks, while we adapt to their surroundings and improve simply by looking. »
Current methods of teaching a task to a robot generally rely on imitation or reinforcement learning. In imitation learning, humans manually operate a robot to teach it to perform a task. This process must be repeated several times for a single task before the robot learns. In reinforcement learning, the robot is typically trained on millions of examples in simulation and then asked to adapt the training to the real world.
Both learning models work well when teaching a robot a single task in a structured environment, but they are difficult to scale and implement. WHIRL can learn from any video of a human doing a task. It is easily scalable, not limited to a specific task and can work in realistic home environments. The team is even working on a version of WHIRL trained by watching videos of human interaction on YouTube and Flickr.
Advances in computer vision have made the work possible. Using models trained on internet data, computers can now understand and model movement in 3D. The team used these models to understand human movement, facilitating WHIRL training.
With WHIRL, a robot can perform tasks in its natural environment. Appliances, doors, drawers, lids, chairs and garbage bags have not been modified or manipulated to fit the robot. The robot’s first attempt at a task ended in failure, but after it had a few successes, it quickly figured out how to do it and mastered it. Although the robot may not perform the task with the same movements as a human, that is not the goal. Humans and robots have different parts and they move differently. What matters is that the end result is the same. The door is open. The contact is switched off. The tap is open.
“To scale robotics in nature, data must be reliable and stable, and robots must improve their environment by training themselves,” Pathak said.
Source of the story:
Materials provided by Carnegie Mellon University. Originally written by Aaron Aupperlee. Note: Content can be edited for style and length.