John McBride: How To Build Your Own AI Infrastucture With Kubernetes ConTejas Code podcast

John McBride: How to Build Your Own AI Infrastucture with Kubernetes

12M ago 1:40:57

Content provided by Tejas Kumar. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Tejas Kumar or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

Links

- Codecrafters (sponsor): https://tej.as/codecrafters

- OpenSauced blog post: https://opensauced.pizza/blog/how-we-saved-thousands-of-dollars-deploying-low-cost-open-source-ai-technologies

- John on X: https://x.com/johncodezzz

- Tejas on X: https://x.com/tejaskumar_

Summary

John McBride discusses his experience deploying open-source AI technologies at scale with Kubernetes. He shares insights on building AI-enabled applications and the challenges of managing large-scale data engineering.

The conversation focuses on the use of Kubernetes as a platform for running compute and the decision to use TimeScaleDB for storing time-series data and vectors. McBride also highlights the importance of data-intensive applications and recommends the book 'Designing Data-Intensive Applications' for further reading.

The conversation discusses the process of migrating from OpenAI to an open-source large language model (LLM) inference engine. The decision to switch to an open-source LLM was driven by the need for cost optimization and the desire to have more control over the infrastructure. VLLM was chosen as the inference engine due to its compatibility with the OpenAI API and its performance. The migration process involved deploying Kubernetes, setting up node groups with GPUs, running VLLM pods, and using a Kubernetes service for load balancing.

The conversation emphasizes the importance of choosing the right level of abstraction and understanding the trade-offs involved.

Takeaways

1. Building AI-enabled applications requires good mass-scale data engineering.

2. Kubernetes is an excellent platform for servicing large-scale applications.

3. TimeScaleDB, built on top of Postgres, is a suitable choice for storing time-series data and vectors.

4. The book 'Designing Data-Intensive Applications' is recommended for understanding data-intensive application development.

5. Choosing the right level of abstraction is important, and it depends on factors such as expertise, time constraints, and specific requirements.

6. The use of Kubernetes can be complex and expensive, and it may not be necessary for all startups.

7. The decision to adopt Kubernetes should consider the scale and needs of the company, as well as the operational burden it may bring.

Chapters

00:00 John McBride

03:05 Introduction and Background

07:24 Summary of the Blog Post

12:15 The Role of Kubernetes in AI-Enabled Applications

16:10 The Use of TimeScaleDB for Storing Time-Series Data and Vectors

35:37 Migrating to an Open-Source LLM Inference Engine

47:35 Deploying Kubernetes and Setting Up Node Groups

55:14 Choosing VLLM as the Inference Engine

1:02:21 The Migration Process: Deploying Kubernetes and Setting Up Node Groups

1:08:02 Choosing the Right Level of Abstraction

1:24:12 Challenges in Evaluating Language Model Performance

1:31:41 Considerations for Adopting Kubernetes in Startups

Hosted on Acast. See acast.com/privacy for more information.

88 episodes

Podcasts Worth a Listen

ConTejas Code « »
John McBride: How to Build Your Own AI Infrastucture with Kubernetes