Large Language Models On The Edge
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 30, 2025 23:13 ()
What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 486763191 series 3669470
We discuss the paper "A Review on Edge Large Language Models: Design, Execution, and Applications" (https://arxiv.org/pdf/2410.11845) which is a survey on the design, execution, and applications of large language models (LLMs) on edge devices. It highlights the challenges of deploying large models with billions of parameters on resource-constrained hardware, including memory constraints and computational demands. The paper explores offline pre-deployment techniques such as quantization, pruning, knowledge distillation, and low-rank approximation to make models more efficient, and online runtime optimizations, covering software-level optimizations, hardware-software co-design, and hardware-level considerations. Finally, it showcases various on-device LLM applications across personal, enterprise, and industrial domains and discusses future research directions and open challenges in this field.
15 episodes