Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Seth Earley. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Seth Earley or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Earley AI Podcast Episode 70 - AI at Scale: Why Infrastructure Matters More Than Ever

30:57
 
Share
 

Manage episode 494421076 series 2984858
Content provided by Seth Earley. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Seth Earley or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

This episode features a fascinating conversation with Sid Sheth, CEO and Co-Founder of d-Matrix. With a deep background in building advanced systems for high-performance workloads, Sid and his team are at the forefront of AI compute innovation—specifically focused on making AI inference more efficient, cost-effective, and scalable for enterprise use. Host Seth Earley dives into Sid’s journey, the architectural shifts in AI infrastructure, and what it means for organizations seeking to maximize their AI investments.

Key Takeaways:

  • The Evolution of AI Infrastructure: Sid breaks down how the traditional tech stack is being rebuilt to support the unique demands of AI, particularly shifting from general-purpose CPUs to specialized accelerators for inference.
  • Training vs. Inference: Using a human analogy, Sid explains the fundamental difference between model training (learning) and inference (applying knowledge), emphasizing why most enterprise value comes from efficient inference.
  • Purpose-built Accelerators: d-Matrix’s approach to creating inference-only accelerators means dramatically reducing overhead, latency, energy consumption, and cost compared to traditional GPU solutions.
  • Scalability & Efficiency: Learn how in-memory compute, chiplets, and innovative memory architectures enable d-Matrix to deliver up to 10x lower latency, and significant gains in energy and dollar efficiency for AI applications.
  • Market Trends: Sid reveals how, although today’s focus is largely on training compute, the next five to ten years will see inference dominate as organizations seek ROI from deployed AI.
  • Enterprise Strategy Advice: Sid urges tech leaders not to be conservative, but to embrace a heterogeneous and flexible infrastructure strategy to future-proof their AI investments.
  • Real-World Use Cases: Hear about d-Matrix’s work enabling low-latency agentic/reasoning models, which are critical for real-time and interactive AI workloads.

Insightful Quote from Sid Sheth:

“Now is not the time to be conservative and get comfortable with choice. In the world of inference there isn’t going to be one size fits all... The world of the future is heterogeneous, where you’re going to have a compute fleet that is augmented with different types of compute to serve different needs.”

Tune in to discover how to rethink your AI infrastructure strategy and stay ahead in the rapidly evolving world of enterprise AI!

Thanks to our sponsors:

  continue reading

70 episodes

Artwork
iconShare
 
Manage episode 494421076 series 2984858
Content provided by Seth Earley. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Seth Earley or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

This episode features a fascinating conversation with Sid Sheth, CEO and Co-Founder of d-Matrix. With a deep background in building advanced systems for high-performance workloads, Sid and his team are at the forefront of AI compute innovation—specifically focused on making AI inference more efficient, cost-effective, and scalable for enterprise use. Host Seth Earley dives into Sid’s journey, the architectural shifts in AI infrastructure, and what it means for organizations seeking to maximize their AI investments.

Key Takeaways:

  • The Evolution of AI Infrastructure: Sid breaks down how the traditional tech stack is being rebuilt to support the unique demands of AI, particularly shifting from general-purpose CPUs to specialized accelerators for inference.
  • Training vs. Inference: Using a human analogy, Sid explains the fundamental difference between model training (learning) and inference (applying knowledge), emphasizing why most enterprise value comes from efficient inference.
  • Purpose-built Accelerators: d-Matrix’s approach to creating inference-only accelerators means dramatically reducing overhead, latency, energy consumption, and cost compared to traditional GPU solutions.
  • Scalability & Efficiency: Learn how in-memory compute, chiplets, and innovative memory architectures enable d-Matrix to deliver up to 10x lower latency, and significant gains in energy and dollar efficiency for AI applications.
  • Market Trends: Sid reveals how, although today’s focus is largely on training compute, the next five to ten years will see inference dominate as organizations seek ROI from deployed AI.
  • Enterprise Strategy Advice: Sid urges tech leaders not to be conservative, but to embrace a heterogeneous and flexible infrastructure strategy to future-proof their AI investments.
  • Real-World Use Cases: Hear about d-Matrix’s work enabling low-latency agentic/reasoning models, which are critical for real-time and interactive AI workloads.

Insightful Quote from Sid Sheth:

“Now is not the time to be conservative and get comfortable with choice. In the world of inference there isn’t going to be one size fits all... The world of the future is heterogeneous, where you’re going to have a compute fleet that is augmented with different types of compute to serve different needs.”

Tune in to discover how to rethink your AI infrastructure strategy and stay ahead in the rapidly evolving world of enterprise AI!

Thanks to our sponsors:

  continue reading

70 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play