Large Language Models On The Edge
Manage episode 486763191 series 3669470
We discuss the paper "A Review on Edge Large Language Models: Design, Execution, and Applications" (https://arxiv.org/pdf/2410.11845) which is a survey on the design, execution, and applications of large language models (LLMs) on edge devices. It highlights the challenges of deploying large models with billions of parameters on resource-constrained hardware, including memory constraints and computational demands. The paper explores offline pre-deployment techniques such as quantization, pruning, knowledge distillation, and low-rank approximation to make models more efficient, and online runtime optimizations, covering software-level optimizations, hardware-software co-design, and hardware-level considerations. Finally, it showcases various on-device LLM applications across personal, enterprise, and industrial domains and discusses future research directions and open challenges in this field.
10 episodes