John McBride: How to Build Your Own AI Infrastucture with Kubernetes
Manage episode 493171495 series 3676184
Links
- Codecrafters (sponsor): https://tej.as/codecrafters
- OpenSauced blog post: https://opensauced.pizza/blog/how-we-saved-thousands-of-dollars-deploying-low-cost-open-source-ai-technologies
- John on X: https://x.com/johncodezzz
- Tejas on X: https://x.com/tejaskumar_
Summary
John McBride discusses his experience deploying open-source AI technologies at scale with Kubernetes. He shares insights on building AI-enabled applications and the challenges of managing large-scale data engineering.
The conversation focuses on the use of Kubernetes as a platform for running compute and the decision to use TimeScaleDB for storing time-series data and vectors. McBride also highlights the importance of data-intensive applications and recommends the book 'Designing Data-Intensive Applications' for further reading.
The conversation discusses the process of migrating from OpenAI to an open-source large language model (LLM) inference engine. The decision to switch to an open-source LLM was driven by the need for cost optimization and the desire to have more control over the infrastructure. VLLM was chosen as the inference engine due to its compatibility with the OpenAI API and its performance. The migration process involved deploying Kubernetes, setting up node groups with GPUs, running VLLM pods, and using a Kubernetes service for load balancing.
The conversation emphasizes the importance of choosing the right level of abstraction and understanding the trade-offs involved.
Takeaways
1. Building AI-enabled applications requires good mass-scale data engineering.
2. Kubernetes is an excellent platform for servicing large-scale applications.
3. TimeScaleDB, built on top of Postgres, is a suitable choice for storing time-series data and vectors.
4. The book 'Designing Data-Intensive Applications' is recommended for understanding data-intensive application development.
5. Choosing the right level of abstraction is important, and it depends on factors such as expertise, time constraints, and specific requirements.
6. The use of Kubernetes can be complex and expensive, and it may not be necessary for all startups.
7. The decision to adopt Kubernetes should consider the scale and needs of the company, as well as the operational burden it may bring.
Chapters
00:00 John McBride
03:05 Introduction and Background
07:24 Summary of the Blog Post
12:15 The Role of Kubernetes in AI-Enabled Applications
16:10 The Use of TimeScaleDB for Storing Time-Series Data and Vectors
35:37 Migrating to an Open-Source LLM Inference Engine
47:35 Deploying Kubernetes and Setting Up Node Groups
55:14 Choosing VLLM as the Inference Engine
1:02:21 The Migration Process: Deploying Kubernetes and Setting Up Node Groups
1:08:02 Choosing the Right Level of Abstraction
1:24:12 Challenges in Evaluating Language Model Performance
1:31:41 Considerations for Adopting Kubernetes in Startups
Hosted on Acast. See acast.com/privacy for more information.
88 episodes