Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Google and Google AI. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Google and Google AI or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Building real-time voice applications with Live API

40:14
 
Share
 

Manage episode 498587556 series 3624003
Content provided by Google and Google AI. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Google and Google AI or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Shrestha Basu Mallick, one of the product leads for the Gemini API, joins host Logan Kilpatrick for a deep dive of Gemini Live API, Google’s real-time, multimodal interface for developers. Learn about how native audio alongside new capabilities like proactive audio and async function calling unlocks the unique power of audio as an interface.

Watch on YouTube: https://www.youtube.com/watch?v=4xlwlU6h-wM
0:00 - Intro
1:18 - Live API Overview
3:36 - Why audio is a special modality
5:07 - Speed vs. precision in audio
6:17 - Controllable and promptable TTS
8:31 - What developers are building with the Live API
11:14 - URL context and async calling features
15:02 - Proactive audio and affective dialog
16:55 - Addressing developer feedback
21:54 - Live API roadmap
23:49 - The role of long context
24:57 - What’s next for the Live API
26:41 - State of the AI audio market
30:10 - Advice for developers getting started with the Live API
31:16 - Live API demo
38:10 - Demo wrap up and closing

  continue reading

22 episodes

Artwork
iconShare
 
Manage episode 498587556 series 3624003
Content provided by Google and Google AI. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Google and Google AI or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Shrestha Basu Mallick, one of the product leads for the Gemini API, joins host Logan Kilpatrick for a deep dive of Gemini Live API, Google’s real-time, multimodal interface for developers. Learn about how native audio alongside new capabilities like proactive audio and async function calling unlocks the unique power of audio as an interface.

Watch on YouTube: https://www.youtube.com/watch?v=4xlwlU6h-wM
0:00 - Intro
1:18 - Live API Overview
3:36 - Why audio is a special modality
5:07 - Speed vs. precision in audio
6:17 - Controllable and promptable TTS
8:31 - What developers are building with the Live API
11:14 - URL context and async calling features
15:02 - Proactive audio and affective dialog
16:55 - Addressing developer feedback
21:54 - Live API roadmap
23:49 - The role of long context
24:57 - What’s next for the Live API
26:41 - State of the AI audio market
30:10 - Advice for developers getting started with the Live API
31:16 - Live API demo
38:10 - Demo wrap up and closing

  continue reading

22 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play