Wednesday, November 13, 2024

Helping Robots Focus On Key Objects

- Advertisement -

MIT engineers have created Clio, a method that lets robots make smart choices like humans and remember only important details to complete tasks.

Caption:From left to right: team members Lukas Schmid, Nathan Hughes, Dominic Maggio, Yun Chang, and Luca Carlone.
Credits:Credit: Andy Ryan
Caption:From left to right: team members Lukas Schmid, Nathan Hughes, Dominic Maggio, Yun Chang, and Luca Carlone.
Credits:Credit: Andy Ryan

Imagine you need to clean a cluttered kitchen countertop scattered with different sauce packets. If you want to clear the counter, you might collect all the packets and sweep them off together. However, if you want to separate the mustard packets first, you would sort them by sauce type. If you’re specifically looking for Grey Poupon mustard, conduct a more detailed search to find this exact brand.

MIT engineers have developed a method named Clio, after the Greek muse of history, that enables robots to make intuitive, task-relevant decisions similar to humans. Clio allows a robot to process a list of tasks described in natural language and then determine the level of detail needed to interpret its environment effectively. The robot then “remembers” only the parts of a scene pertinent to the tasks at hand. 

- Advertisement -

Open fields

The latest research has shifted toward “open-set” recognition, employing deep-learning techniques to develop neural networks capable of processing billions of internet-sourced images and corresponding descriptive texts, like a Facebook photo of a dog with the caption “Meet my new puppy!” Through exposure to millions of such image-text pairs, these neural networks learn to pinpoint scene elements linked to specific terms, enabling a robot to identify a dog in an entirely new setting. Despite these advancements, challenges persist in effectively parsing scenes in ways that are directly applicable to specific tasks.

Information bottleneck

The team’s approach combines computer vision with neural networks to analyze millions of open-source images and texts, using mapping tools to segment images for processing. They apply the “information bottleneck” principle from information theory to retain crucial segments for specific tasks. Their system, Clio, was tested in real-world applications such as organizing a cluttered apartment and assisting Boston Dynamics’ robot, Spot, in an office environment. Running in real-time on Spot’s onboard computer, Clio effectively identified and mapped target objects, enabling the robot to complete tasks efficiently.

Looking ahead, the team intends to enhance Clio’s capabilities to manage more complex tasks and incorporate the latest developments in photorealistic visual scene representations.

Nidhi Agarwal
Nidhi Agarwal
Nidhi Agarwal is a journalist at EFY. She is an Electronics and Communication Engineer with over five years of academic experience. Her expertise lies in working with development boards and IoT cloud. She enjoys writing as it enables her to share her knowledge and insights related to electronics, with like-minded techies.

SHARE YOUR THOUGHTS & COMMENTS

EFY Prime

Unique DIY Projects

Electronics News

Truly Innovative Electronics

Latest DIY Videos

Electronics Components

Electronics Jobs

Calculators For Electronics