From Context To Semantics: How Metadata Powers Agentic AI Data Engineering podcast

Content provided by Tobias Macey. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Tobias Macey or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Data Engineering Podcast »
From Context to Semantics: How Metadata Powers Agentic AI

18h ago 1:06:17

MP3•Episode home

Summary
In this episode Suresh Srinivas and Sriharsha Chintalapani explore how metadata platforms are evolving from human-centric catalogs into the foundational context layer for AI and agentic systems. They discuss the origins and growth of OpenMetadata and Collate, why “context” is necessary but “semantics” is critical for precise AI outcomes, and how a schema-first, API-first, unified platform enables discovery, observability, and governance in one workflow. They share how AI agents can now automate documentation, classification, data quality testing, and enforcement of policies, and why aligning governance with user identity and intent is essential as agentic access scales. They also dig into scalability strategies, MCP-based agent workflows, AI governance (including model/agent tracking), and the emerging convergence of big data with ontologies to deliver machine-understandable meaning.
Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.
You’re a developer who wants to innovate—instead, you’re stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It’s a flexible, unified platform that’s built for developers, by developers. MongoDB is ACID compliant, Enterprise-ready, with the capabilities you need to ship AI apps—fast. That’s why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at MongoDB.com/Build
Your host is Tobias Macey and today I'm interviewing Suresh Srinivas and Sriharsha Chintalapani about how metadata catalogs provide the context clues necessary to give meaning to your data for AI systems

Interview

Introduction
How did you get involved in the area of data management?
Can you start by giving an overview of the roles that metadata catalogs are playing in the current state of the ecosystem?
How has the OpenMetadata platform evolved over the past 4 years?
- How has the focus on LLMs/generative AI changed the trajectory of services like OpenMetadata?
The initial set of use cases for data catalogs was to facilitate discovery and documentation of data assets for human consumption. What are the structural elements of that effort that have paid dividends for an AI audience?
- How does the AI audience change the requirements around the cataloging and presentation of metadata?
One of the constant challenges in data infrastructure now is the tension of making data accessible to AI systems (agentic or otherwise) and incorporating AI into the inner loop of the service. What are the opportunities for bringing AI inside the boundaries of a system like OpenMetadata vs. as a client or consumer of the platform?
The key phrase of the past ~2 years is "context engineering". What role does the metadata catalog play in that undertaking?
- What are the capabilities that the catalog needs to be able to effectively populate and curate that context?
- How much awareness does the LLM or agent need to have to be able to use the catalog effectively?
What does a typical workflow/agent loop look like when it is using something like OpenMetadata in pursuit of knowledge that it needs to achieve an objective?
How do agentic use cases strain the existing set of governance frameworks?
- What new considerations (procedural or technical) need to be factored into governance practices to balance velocity with security?
What are the most interesting, innovative, or unexpected ways that you have seen OpenMetadata/Collate used in AI/agentic contexts?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on OpenMetadata/Collate?
When is OpenMetadata/Collate the wrong choice?
What do you have planned for the future of OpenMetadata?

Contact Info

Suresh
- LinkedIn
Sriharsha
- LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

494 episodes

#Tech #Tobias Macey #Data Engineering #Big Data #Database #Data Pipeline #Podcasting Education