Stanford Seminar - Generalization through Task Representations with Foundation Models

May 23, 2025
Student Speaker - Wenlong Huang, Stanford University

Building robots that can operate autonomously in unstructured environments by following arbitrary natural language commands has long been the north star in robotic manipulation. While there has been tremendous progress in learning visuomotor policies that exhibit promising signs for open-world deployment, generalization to unseen tasks or motions largely remains unattainable or out of scope. In this talk, I will discuss how deliberate choices of task representations enable such zero-shot generalization at the task level, despite given no task-specific demonstrations. Notably, I will discuss our years-long investigations into extracting task representations from off-the-shelf foundation models; I will discuss its evolution from a language-only representation to 4D space-time domain and their applications to model-based planning, affordance learning, and visuomotor policy learning. At the end of the talk, I will present an alternative view for scaling towards robotic intelligence: by leveraging foundation models to provide task-specific knowledge in the form of task representations, robotic data scaling can focus on learning from task-agnostic interactions with a world modeling objective, such that collectively this enables robots that not only understand the world as humans do but can also act within it with purpose and generality.

About the speaker: https://wenlong.page/

More about the course can be found here: https://stanfordasl.github.io/robotics_seminar/

View the entire AA289 Stanford Robotics and Autonomous Systems Seminar playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rMeercb-kvGLUrOq4HR6BZD

► Check out the entire catalog of courses and programs available through Stanford Online: https://online.stanford.edu/explore

View our Robotics and Autonomous Systems Graduate Certificate: https://online.stanford.edu/programs/robotics-and-autonomous-systems-graduate-certificate Receive SMS online on sms24.me

TubeReader video aggregator is a website that collects and organizes online videos from the YouTube source. Video aggregation is done for different purposes, and TubeReader take different approaches to achieve their purpose.

Our try to collect videos of high quality or interest for visitors to view; the collection may be made by editors or may be based on community votes.

Another method is to base the collection on those videos most viewed, either at the aggregator site or at various popular video hosting sites.

TubeReader site exists to allow users to collect their own sets of videos, for personal use as well as for browsing and viewing by others; TubeReader can develop online communities around video sharing.

Our site allow users to create a personalized video playlist, for personal use as well as for browsing and viewing by others.

@YouTubeReaderBot allows you to subscribe to Youtube channels.

By using @YouTubeReaderBot Bot you agree with YouTube Terms of Service.

Use the @YouTubeReaderBot telegram bot to be the first to be notified when new videos are released on your favorite channels.

Look for new videos or channels and share them with your friends.

You can start using our bot from this video, subscribe now to Stanford Seminar - Generalization through Task Representations with Foundation Models

What is YouTube?

YouTube is a free video sharing website that makes it easy to watch online videos. You can even create and upload your own videos to share with others. Originally created in 2005, YouTube is now one of the most popular sites on the Web, with visitors watching around 6 billion hours of video every month.