Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Tejas Kumar. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Tejas Kumar or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

John McBride: How to Build Your Own AI Infrastucture with Kubernetes

1:40:57
 
Share
 

Manage episode 493171495 series 3676184
Content provided by Tejas Kumar. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Tejas Kumar or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Links


- Codecrafters (sponsor): https://tej.as/codecrafters

- OpenSauced blog post: https://opensauced.pizza/blog/how-we-saved-thousands-of-dollars-deploying-low-cost-open-source-ai-technologies

- John on X: https://x.com/johncodezzz

- Tejas on X: https://x.com/tejaskumar_


Summary


John McBride discusses his experience deploying open-source AI technologies at scale with Kubernetes. He shares insights on building AI-enabled applications and the challenges of managing large-scale data engineering.


The conversation focuses on the use of Kubernetes as a platform for running compute and the decision to use TimeScaleDB for storing time-series data and vectors. McBride also highlights the importance of data-intensive applications and recommends the book 'Designing Data-Intensive Applications' for further reading.


The conversation discusses the process of migrating from OpenAI to an open-source large language model (LLM) inference engine. The decision to switch to an open-source LLM was driven by the need for cost optimization and the desire to have more control over the infrastructure. VLLM was chosen as the inference engine due to its compatibility with the OpenAI API and its performance. The migration process involved deploying Kubernetes, setting up node groups with GPUs, running VLLM pods, and using a Kubernetes service for load balancing.


The conversation emphasizes the importance of choosing the right level of abstraction and understanding the trade-offs involved.


Takeaways


1. Building AI-enabled applications requires good mass-scale data engineering.

2. Kubernetes is an excellent platform for servicing large-scale applications.

3. TimeScaleDB, built on top of Postgres, is a suitable choice for storing time-series data and vectors.

4. The book 'Designing Data-Intensive Applications' is recommended for understanding data-intensive application development.

5. Choosing the right level of abstraction is important, and it depends on factors such as expertise, time constraints, and specific requirements.

6. The use of Kubernetes can be complex and expensive, and it may not be necessary for all startups.

7. The decision to adopt Kubernetes should consider the scale and needs of the company, as well as the operational burden it may bring.


Chapters


00:00 John McBride

03:05 Introduction and Background

07:24 Summary of the Blog Post

12:15 The Role of Kubernetes in AI-Enabled Applications

16:10 The Use of TimeScaleDB for Storing Time-Series Data and Vectors

35:37 Migrating to an Open-Source LLM Inference Engine

47:35 Deploying Kubernetes and Setting Up Node Groups

55:14 Choosing VLLM as the Inference Engine

1:02:21 The Migration Process: Deploying Kubernetes and Setting Up Node Groups

1:08:02 Choosing the Right Level of Abstraction

1:24:12 Challenges in Evaluating Language Model Performance

1:31:41 Considerations for Adopting Kubernetes in Startups


Hosted on Acast. See acast.com/privacy for more information.

  continue reading

88 episodes

Artwork
iconShare
 
Manage episode 493171495 series 3676184
Content provided by Tejas Kumar. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Tejas Kumar or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Links


- Codecrafters (sponsor): https://tej.as/codecrafters

- OpenSauced blog post: https://opensauced.pizza/blog/how-we-saved-thousands-of-dollars-deploying-low-cost-open-source-ai-technologies

- John on X: https://x.com/johncodezzz

- Tejas on X: https://x.com/tejaskumar_


Summary


John McBride discusses his experience deploying open-source AI technologies at scale with Kubernetes. He shares insights on building AI-enabled applications and the challenges of managing large-scale data engineering.


The conversation focuses on the use of Kubernetes as a platform for running compute and the decision to use TimeScaleDB for storing time-series data and vectors. McBride also highlights the importance of data-intensive applications and recommends the book 'Designing Data-Intensive Applications' for further reading.


The conversation discusses the process of migrating from OpenAI to an open-source large language model (LLM) inference engine. The decision to switch to an open-source LLM was driven by the need for cost optimization and the desire to have more control over the infrastructure. VLLM was chosen as the inference engine due to its compatibility with the OpenAI API and its performance. The migration process involved deploying Kubernetes, setting up node groups with GPUs, running VLLM pods, and using a Kubernetes service for load balancing.


The conversation emphasizes the importance of choosing the right level of abstraction and understanding the trade-offs involved.


Takeaways


1. Building AI-enabled applications requires good mass-scale data engineering.

2. Kubernetes is an excellent platform for servicing large-scale applications.

3. TimeScaleDB, built on top of Postgres, is a suitable choice for storing time-series data and vectors.

4. The book 'Designing Data-Intensive Applications' is recommended for understanding data-intensive application development.

5. Choosing the right level of abstraction is important, and it depends on factors such as expertise, time constraints, and specific requirements.

6. The use of Kubernetes can be complex and expensive, and it may not be necessary for all startups.

7. The decision to adopt Kubernetes should consider the scale and needs of the company, as well as the operational burden it may bring.


Chapters


00:00 John McBride

03:05 Introduction and Background

07:24 Summary of the Blog Post

12:15 The Role of Kubernetes in AI-Enabled Applications

16:10 The Use of TimeScaleDB for Storing Time-Series Data and Vectors

35:37 Migrating to an Open-Source LLM Inference Engine

47:35 Deploying Kubernetes and Setting Up Node Groups

55:14 Choosing VLLM as the Inference Engine

1:02:21 The Migration Process: Deploying Kubernetes and Setting Up Node Groups

1:08:02 Choosing the Right Level of Abstraction

1:24:12 Challenges in Evaluating Language Model Performance

1:31:41 Considerations for Adopting Kubernetes in Startups


Hosted on Acast. See acast.com/privacy for more information.

  continue reading

88 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play