September 7th, 2023 - SLiMe, Matcha-TTS, RoboSense, and CM3Leon: Revolutionizing Vision, Speech, and Multi-Modal Intelligence for a Smarter, Faster Future
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 26, 2024 07:56 ()
What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 376364153 series 3485608
Chapters
1. Intro (00:00:00)
2. SLiMe: Segment Like Me (00:01:22)
3. Matcha-TTS: A fast TTS architecture with conditional flow matching (00:03:01)
4. Physically Grounded Vision-Language Models for Robotic Manipulation (00:04:45)
5. Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (00:05:49)
75 episodes