BBC Radio 5 live’s award winning gaming podcast, discussing the world of video games and games culture.
…
continue reading
Content provided by Pure Storage. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Pure Storage or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!
Go offline with the Player FM app!
Accelerating Enterprise AI Inference with Pure KVA
MP3•Episode home
Manage episode 521233417 series 3054169
Content provided by Pure Storage. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Pure Storage or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
In this episode, we sit down with Solution Architect Robert Alvarez to discuss the technology behind Pure Key-Value Accelerator (KVA) and its role in accelerating AI inference. Pure KVA is a protocol-agnostic, key-value caching solution that, when combined with FlashBlade data storage, dramatically improves GPU efficiency and consistency in AI environments. Robert—whose background includes time as a Santa Clara University professor, NASA Solution Architect, and work at CERN—explains how this innovation is essential for serving an entire fleet of AI workloads, including modern agentic or chatbot interfaces. Robert dives into the massive growth of the AI Inference market, driven by the need for near real-time processing and low-latency AI applications. This trend makes the need for a solution like Pure KVA critical. He details how KVA removes the bottleneck of GPU memory and shares compelling benchmark results: up to twenty times faster inference with NFS and six times faster with S3, all over standard Ethernet. These performance gains are key to helping enterprises scale more efficiently and reduce overall GPU costs. Beyond the technical deep dive, the episode explores the origin of the KVA idea, the unique Pure IP that enables it, and future integrations like Dynamo and the partnership with Comet for LLM observability. In the popular “Hot Takes” segment, Robert offers his perspective on blind spots IT leaders might have in managing AI data and shares advice for his younger self on the future of the data management space. To learn more about Pure KVA, visit purestorage.com/launch. Check out the new Pure Storage digital customer community to join the conversation with peers and Pure experts: https://purecommunity.purestorage.com/ 00:00 Intro and Welcome 02:21 Background on Our Guest 06:57 Stat of the Episode on AI Inferencing Spend 09:10 Why AI Inference is Difficult at Scale 11:00 How KV Cache Acceleration Works 14:50 Key Partnerships Using KVA 20:28 Hot Takes Segment
…
continue reading
263 episodes
MP3•Episode home
Manage episode 521233417 series 3054169
Content provided by Pure Storage. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Pure Storage or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
In this episode, we sit down with Solution Architect Robert Alvarez to discuss the technology behind Pure Key-Value Accelerator (KVA) and its role in accelerating AI inference. Pure KVA is a protocol-agnostic, key-value caching solution that, when combined with FlashBlade data storage, dramatically improves GPU efficiency and consistency in AI environments. Robert—whose background includes time as a Santa Clara University professor, NASA Solution Architect, and work at CERN—explains how this innovation is essential for serving an entire fleet of AI workloads, including modern agentic or chatbot interfaces. Robert dives into the massive growth of the AI Inference market, driven by the need for near real-time processing and low-latency AI applications. This trend makes the need for a solution like Pure KVA critical. He details how KVA removes the bottleneck of GPU memory and shares compelling benchmark results: up to twenty times faster inference with NFS and six times faster with S3, all over standard Ethernet. These performance gains are key to helping enterprises scale more efficiently and reduce overall GPU costs. Beyond the technical deep dive, the episode explores the origin of the KVA idea, the unique Pure IP that enables it, and future integrations like Dynamo and the partnership with Comet for LLM observability. In the popular “Hot Takes” segment, Robert offers his perspective on blind spots IT leaders might have in managing AI data and shares advice for his younger self on the future of the data management space. To learn more about Pure KVA, visit purestorage.com/launch. Check out the new Pure Storage digital customer community to join the conversation with peers and Pure experts: https://purecommunity.purestorage.com/ 00:00 Intro and Welcome 02:21 Background on Our Guest 06:57 Stat of the Episode on AI Inferencing Spend 09:10 Why AI Inference is Difficult at Scale 11:00 How KV Cache Acceleration Works 14:50 Key Partnerships Using KVA 20:28 Hot Takes Segment
…
continue reading
263 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.