Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Dr. Satya Mallick. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Dr. Satya Mallick or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

DeepSeek-OCR : A Revolutionary Idea

14:33
 
Share
 

Manage episode 515305446 series 3693358
Content provided by Dr. Satya Mallick. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Dr. Satya Mallick or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode, we dive deep into DeepSeek-OCR, a cutting-edge open-source Optical Character Recognition (OCR) / Text Recognition model that’s redefining accuracy and efficiency in document understanding.

DeepSeek-OCR flips long-context processing on its head by rendering text as images and then decoding it back—shrinking context length by 7–20× while preserving high fidelity.

We break down how the two-stage stack works—DeepEncoder (optical/vision encoding of pages) + MoE decoder (text reconstruction and reasoning)—and why this “context optical compression” matters for million-token workflows, from legal PDFs to scientific tables.

We also dive into accuracy trade-offs (≈96–97% at ~10× compression), benchmarks, and practical implications for cost, latency, and multimodal RAG. If you care about scaling LLMs beyond brittle token limits, this is the paradigm shift to watch.

Resources:

  1. DeepSeek-OCR Repo: https://github.com/deepseek-ai/DeepSeek-OCR/tree/main
  2. DeepSeek-OCR Paper: https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf
  3. Start your AI career: https://opencv.org/university
  4. Need help in building AI solutions? https://bigvision.ai
  continue reading

4 episodes

Artwork
iconShare
 
Manage episode 515305446 series 3693358
Content provided by Dr. Satya Mallick. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Dr. Satya Mallick or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode, we dive deep into DeepSeek-OCR, a cutting-edge open-source Optical Character Recognition (OCR) / Text Recognition model that’s redefining accuracy and efficiency in document understanding.

DeepSeek-OCR flips long-context processing on its head by rendering text as images and then decoding it back—shrinking context length by 7–20× while preserving high fidelity.

We break down how the two-stage stack works—DeepEncoder (optical/vision encoding of pages) + MoE decoder (text reconstruction and reasoning)—and why this “context optical compression” matters for million-token workflows, from legal PDFs to scientific tables.

We also dive into accuracy trade-offs (≈96–97% at ~10× compression), benchmarks, and practical implications for cost, latency, and multimodal RAG. If you care about scaling LLMs beyond brittle token limits, this is the paradigm shift to watch.

Resources:

  1. DeepSeek-OCR Repo: https://github.com/deepseek-ai/DeepSeek-OCR/tree/main
  2. DeepSeek-OCR Paper: https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf
  3. Start your AI career: https://opencv.org/university
  4. Need help in building AI solutions? https://bigvision.ai
  continue reading

4 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play