Google's RT-2 Robot Learns Tasks from Internet Videos and Text
Google DeepMind
Jan 5, 2024 00:00
Google Robotics Team
1 views
roboticsgooglert-2vision-languagegeneralizationmanipulation
Summary
Google's RT-2 enables robots to learn from internet content, improving task generalization.
Google DeepMind has introduced RT-2 (Robotics Transformer 2), a vision-language-action model that enables robots to learn new tasks by observing internet videos and reading text instructions. Unlike traditional robotics systems that require extensive task-specific training, RT-2 can generalize from web-scale data to perform novel manipulation tasks. The system demonstrates the ability to understand abstract concepts and apply them to physical actions, such as "pick up the extinct animal" when presented with toy dinosaurs among other objects. This approach represents a significant step toward more adaptable and intelligent robotic systems.