Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2

21:33
 
Share
 

Manage episode 428522579 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode of Changelog, Weston Pace dives into the latest updates to LanceDB, an open-source vector database and file format. Lance's new V2 file format redefines the traditional notion of columnar storage, allowing for more efficient handling of large multimodal datasets like images and embeddings. Weston discusses the goals driving LanceDB's development, including null value support, multimodal data handling, and finding an optimal balance for search performance.

Sound Bites

"A little bit more power to actually just try." "We're becoming a little bit more feature complete with returns of arrow." "Weird data representations that are actually really optimized for your use case."

Key Points

  • Weston introduces LanceDB, an open-source multimodal vector database and file format.
  • The goals behind LanceDB's design: handling null values, multimodal data, and finding the right balance between point lookups and full dataset scan performance.
  • Lance V2 File Format:
  • Potential Use Cases

Conversation Highlights

  • On the benefits of Arrow integration: Strengthening the connection with the Arrow data ecosystem for seamless data handling.
  • Why "columnar container format"?: A broader definition than "table format" to encompass more unconventional use cases.
  • Tackling multimodal data: How LanceDB V2 enables storage of large multimodal data efficiently and without needing tons of memory.
  • Python's role in encoding experimentation: Providing a way to rapidly prototype custom encodings and plug them into LanceDB.

LanceDB:

Weston Pace:

Nicolay Gerold:

Chapters

00:00 Introducing Lance: A New File Format

06:46 Enabling Custom Encodings in Lance

11:51 Exploring the Relationship Between Lance and Arrow

20:04 New Chapter

Lance file format, nulls, round-tripping data, optimized data representations, full-text search, encodings, downsides, multimodal data, compression, point lookups, full scan performance, non-contiguous columns, custom encodings

  continue reading

51 episodes

Artwork
iconShare
 
Manage episode 428522579 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://podcastplayer.com/legal.

In this episode of Changelog, Weston Pace dives into the latest updates to LanceDB, an open-source vector database and file format. Lance's new V2 file format redefines the traditional notion of columnar storage, allowing for more efficient handling of large multimodal datasets like images and embeddings. Weston discusses the goals driving LanceDB's development, including null value support, multimodal data handling, and finding an optimal balance for search performance.

Sound Bites

"A little bit more power to actually just try." "We're becoming a little bit more feature complete with returns of arrow." "Weird data representations that are actually really optimized for your use case."

Key Points

  • Weston introduces LanceDB, an open-source multimodal vector database and file format.
  • The goals behind LanceDB's design: handling null values, multimodal data, and finding the right balance between point lookups and full dataset scan performance.
  • Lance V2 File Format:
  • Potential Use Cases

Conversation Highlights

  • On the benefits of Arrow integration: Strengthening the connection with the Arrow data ecosystem for seamless data handling.
  • Why "columnar container format"?: A broader definition than "table format" to encompass more unconventional use cases.
  • Tackling multimodal data: How LanceDB V2 enables storage of large multimodal data efficiently and without needing tons of memory.
  • Python's role in encoding experimentation: Providing a way to rapidly prototype custom encodings and plug them into LanceDB.

LanceDB:

Weston Pace:

Nicolay Gerold:

Chapters

00:00 Introducing Lance: A New File Format

06:46 Enabling Custom Encodings in Lance

11:51 Exploring the Relationship Between Lance and Arrow

20:04 New Chapter

Lance file format, nulls, round-tripping data, optimized data representations, full-text search, encodings, downsides, multimodal data, compression, point lookups, full scan performance, non-contiguous columns, custom encodings

  continue reading

51 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Listen to this show while you explore
Play