LLM Inference Speed (Tech Deep Dive)

Thinking Machines: AI & Philosophy

Content provided by Daniel Reid Cahn. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daniel Reid Cahn or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

2y ago 39:36

MP3•Episode home

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 30, 2025 21:11 (3M ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

In this tech talk, we dive deep into the technical specifics around LLM inference.

The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?

We jump into:

Is fast model inference the real moat for LLM companies?
What are the implications of slow model inference on the future of decentralized and edge model inference?
As demand rises, what will the latency/throughput tradeoff look like?
What innovations on the horizon might massively speed up model inference?

27 episodes

#Tech #Machine Learning #Artificial Intelligence #Society #Philosophy #Daniel Reid Cahn #MLOps