Go offline with the Player FM app!
636: Red Hat's James Haung
Manage episode 525035138 series 2440919
Links
James on LinkedIn
Mike on LinkedIn
Mike's Blog
Show on Discord
- AI on Red Hat Enterprise Linux (RHEL)
Trust and Stability: RHEL provides the mission-critical foundation needed for workloads where security and reliability cannot be compromised.
Predictive vs. Generative: Acknowledging the hype of GenAI while maintaining support for traditional machine learning algorithms.
Determinism: The challenge of bringing consistency and security to emerging AI technologies in production environments.
- Rama-Llama & Containerization
Developer Simplicity: Rama-Llama helps developers run local LLMs easily without being "locked in" to specific engines; it supports Podman, Docker, and various inference engines like Llama.cpp and Whisper.cpp.
Production Path: The tool is designed to "fade away" after helping package the model and stack into a container that can be deployed directly to Kubernetes.
Behind the Firewall: Addressing the needs of industries (like aircraft maintenance) that require AI to stay strictly on-premises.
- Enterprise AI Infrastructure
Red Hat AI: A commercial product offering tools for model customization, including pre-training, fine-tuning, and RAG (Retrieval-Augmented Generation).
Inference Engines: James highlights the difference between Llama.cpp (for smaller/edge hardware) and vLLM, which has become the enterprise standard for multi-GPU data center inferencing.
584 episodes
Manage episode 525035138 series 2440919
Links
James on LinkedIn
Mike on LinkedIn
Mike's Blog
Show on Discord
- AI on Red Hat Enterprise Linux (RHEL)
Trust and Stability: RHEL provides the mission-critical foundation needed for workloads where security and reliability cannot be compromised.
Predictive vs. Generative: Acknowledging the hype of GenAI while maintaining support for traditional machine learning algorithms.
Determinism: The challenge of bringing consistency and security to emerging AI technologies in production environments.
- Rama-Llama & Containerization
Developer Simplicity: Rama-Llama helps developers run local LLMs easily without being "locked in" to specific engines; it supports Podman, Docker, and various inference engines like Llama.cpp and Whisper.cpp.
Production Path: The tool is designed to "fade away" after helping package the model and stack into a container that can be deployed directly to Kubernetes.
Behind the Firewall: Addressing the needs of industries (like aircraft maintenance) that require AI to stay strictly on-premises.
- Enterprise AI Infrastructure
Red Hat AI: A commercial product offering tools for model customization, including pre-training, fine-tuning, and RAG (Retrieval-Augmented Generation).
Inference Engines: James highlights the difference between Llama.cpp (for smaller/edge hardware) and vLLM, which has become the enterprise standard for multi-GPU data center inferencing.
584 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.