Best Tobias Macey Podcasts (2025)

1
Blurring Lines: Data, AI, and the New Playbook for Team Velocity 1:00:57

Play Pause

8d ago1:00:57

1:00:57

Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and how just‑in‑time retrieval via MCP and CLIs lets agents gather what they n…

1
State, Scale, and Signals: Rethinking Orchestration with Durable Execution 51:46

15d ago51:46

51:46

Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and…

1
The AI Data Paradox: High Trust in Models, Low Trust in Data 51:35

22d ago51:35

51:35

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems,…

1
Bridging the AI–Data Gap: Collect, Curate, Serve 50:40

29d ago50:40

50:40

Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle …

1
Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access 1:05:00

1M ago1:05:00

1:05:00

Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vect…

1
The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies 1:04:16

1M ago1:04:16

1:04:16

Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is less about age and more about when a system becomes a risk: blocking innovation, consuming excess IT time, and cr…

1
Context Engineering as a Discipline: Building Governed AI Analytics 51:58

2M ago51:58

51:58

Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and a…

1
The Data Model That Captures Your Business: Metric Trees Explained 1:01:05

2M ago1:01:05

1:01:05

Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has…

1
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra 56:31

2M ago56:31

56:31

Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting…

1
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture 52:58

3M ago52:58

52:58

Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Ma…

1
Duck Lake: Simplifying the Lakehouse Ecosystem 1:10:41

3M ago1:10:41

1:10:41

Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like I…

1
Aligning Business and Data: The Essential Role of Data Modeling 1:06:51

3M ago1:06:51

1:06:51

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that dat…

1
From Academia to Industry: Bridging Data Engineering Challenges 50:54

3M ago50:54

50:54

Summary In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and lineage, as well as the challenges of data integration. He explores t…

1
High Performance And Low Overhead Graphs With KuzuDB 1:01:29

4M ago1:01:29

1:01:29

Summary In this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. He discusses the usability and scalability of KuzuDB, emphasizing its…

1
Bridging Data and Decision-Making: AI's Role in Modern Analytics 1:10:44

4M ago1:10:44

1:10:44

Summary In this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds in data analytics and how their experiences have shaped their approa…

1
From Bits to Tables: The Evolution of S3 Storage 50:08

4M ago50:08

50:08

Summary In this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and his role at Amazon, where he collaborates to enhance storage capabilities, discussing the evolution of S3 from …

1
Revolutionizing Python Notebooks with Marimo 51:56

4M ago51:56

51:56

Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as…

1
Warehouse Native Incremental Data Processing With Dynamic Tables And Delayed View Semantics 55:07

4M ago55:07

55:07

Summary In this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the complexities of incremental data processing in warehouse environments. Dan discusses the challenges of handling continuously evolving datasets and the importance of incremental data processing for optimized resource use and reduced latency. He expla…

1
Streamlining Data Pipelines with MCP Servers and Vector Engines 52:04

5M ago52:04

52:04

Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured d…

1
Foundational Data Engineering At Two Sigma 55:05

5M ago55:05

55:05

Summary In this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing data quality with delivery speed, and the socio-technical challenges …

1
Enabling Agents In The Enterprise With A Platform Approach 54:18

5M ago54:18

54:18

Summary In this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current entrepreneurial venture focused on multi-agent systems, Arun shares insights on building agentic systems at an org…

1
Dagster's New Era: Modularizing Data Transformation in the Age of AI 1:01:37

6M ago1:01:37

1:01:37

Summary In this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insights on how it will ultimately enhance productivity and expand softwa…

1
AI and the Lakehouse: How Starburst is Pioneering New Workflows 44:09

6M ago44:09

44:09

Summary In this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with the lakehouse architecture. From his software engineering roots to leading data engineering efforts, Alex shares insights on enhancing Starburst's platform to support AI applications, including an A…

1
Amazon S3: The Backbone of Modern Data Systems 1:01:01

6M ago1:01:01

1:01:01

Summary In this episode of the Data Engineering Podcast Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, talks about the evolution of Amazon S3 and its profound impact on data architecture. From her work on compute systems to leading the development and operations of S3, Mylan shares insights on how S3 has become a foundational element …

1
Scaling Data Operations With Platform Engineering 42:20

6M ago42:20

42:20

Summary In this episode of the Data Engineering Podcast Chakravarthy Kotaru talks about scaling data operations through standardized platform offerings. From his roots as an Oracle developer to leading the data platform at a major online travel company, Chakravarthy shares insights on managing diverse database technologies and providing databases a…

1
From Data Discovery to AI: The Evolution of Semantic Layers 49:30

6M ago49:30

49:30

Summary In this episode of the Data Engineering Podcast, host Tobias Macy welcomes back Shinji Kim to discuss the evolving role of semantic layers in the era of AI. As they explore the challenges of managing vast data ecosystems and providing context to data users, they delve into the significance of semantic layers for AI applications. They dive i…

1
Balancing Off-the-Shelf and Custom Solutions in Data Engineering 46:05

7M ago46:05

46:05

Summary In this episode of the Data Engineering Podcast Tulika Bhatt, a senior software engineer at Netflix, talks about her experiences with large-scale data processing and the future of data engineering technologies. Tulika shares her journey into the data engineering field, discussing her work at BlackRock and Verizon before joining Netflix, and…

1
StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics 59:41

7M ago59:41

59:41

Summary In this episode of the Data Engineering Podcast Sida Shen, product manager at CelerData, talks about StarRocks, a high-performance analytical database. Sida discusses the inception of StarRocks, which was forked from Apache Doris in 2020 and evolved into a high-performance Lakehouse query engine. He explains the architectural design of Star…

1
Exploring NATS: A Multi-Paradigm Connectivity Layer for Distributed Applications 1:12:50

7M ago1:12:50

1:12:50

Summary In this episode of the Data Engineering Podcast Derek Collison, creator of NATS and CEO of Synadia, talks about the evolution and capabilities of NATS as a multi-paradigm connectivity layer for distributed applications. Derek discusses the challenges and solutions in building distributed systems, and highlights the unique features of NATS t…

1
Advanced Lakehouse Management With The LakeKeeper Iceberg REST Catalog 57:13

8M ago57:13

57:13

Summary In this episode of the Data Engineering Podcast Viktor Kessler, co-founder of Vakmo, talks about the architectural patterns in the lake house enabled by a fast and feature-rich Iceberg catalog. Viktor shares his journey from data warehouses to developing the open-source project, Lakekeeper, an Apache Iceberg REST catalog written in Rust tha…

1
Simplifying Data Pipelines with Durable Execution 39:49

8M ago39:49

39:49

Summary In this episode of the Data Engineering Podcast Jeremy Edberg, CEO of DBOS, about durable execution and its impact on designing and implementing business logic for data systems. Jeremy explains how DBOS's serverless platform and orchestrator provide local resilience and reduce operational overhead, ensuring exactly-once execution in distrib…

1
Overcoming Redis Limitations: The Dragonfly DB Approach 43:58

8M ago43:58

43:58

Summary In this episode of the Data Engineering Podcast Roman Gershman, CTO and founder of Dragonfly DB, explores the development and impact of high-speed in-memory databases. Roman shares his experience creating a more efficient alternative to Redis, focusing on performance gains, scalability, and cost efficiency, while addressing limitations such…

1
Bringing AI Into The Inner Loop of Data Engineering With Ascend 52:47

8M ago52:47

52:47

Summary In this episode of the Data Engineering Podcast Sean Knapp, CEO of Ascend.io, explores the intersection of AI and data engineering. He discusses the evolution of data engineering and the role of AI in automating processes, alleviating burdens on data engineers, and enabling them to focus on complex tasks and innovation. The conversation cov…

1
Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy 51:41

9M ago51:41

51:41

Summary In this episode of the Data Engineering Podcast Pete DeJoy, co-founder and product lead at Astronomer, talks about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3. Pete shares his journey into data engineering, discusses Astronomer's contributions to the Airflow project, and highlights the cr…

1
Accelerated Computing in Modern Data Centers With Datapelago 55:36

9M ago55:36

55:36

Summary In this episode of the Data Engineering Podcast Rajan Goyal, CEO and co-founder of Datapelago, talks about improving efficiencies in data processing by reimagining system architecture. Rajan explains the shift from hyperconverged to disaggregated and composable infrastructure, highlighting the importance of accelerated computing in modern d…

1
The Future of Data Engineering: AI, LLMs, and Automation 59:39

9M ago59:39

59:39

Summary In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large language models (LLMs) to enhance productivity and reduce manual toil. The c…

1
Evolving Responsibilities in AI Data Management 38:57

10M ago38:57

38:57

Summary In this episode of the Data Engineering Podcast Bartosz Mikulski talks about preparing data for AI applications. Bartosz shares his journey from data engineering to MLOps and emphasizes the importance of data testing over software development in AI contexts. He discusses the types of data assets required for AI applications, including exten…

1
CSVs Will Never Die And OneSchema Is Counting On It 54:40

11M ago54:40

54:40

Summary In this episode of the Data Engineering Podcast Andrew Luo, CEO of OneSchema, talks about handling CSV data in business operations. Andrew shares his background in data engineering and CRM migration, which led to the creation of OneSchema, a platform designed to automate CSV imports and improve data validation processes. He discusses the ch…

1
Breaking Down Data Silos: AI and ML in Master Data Management 57:30

11M ago57:30

57:30

Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organiz…

1
Building a Data Vision Board: A Guide to Strategic Planning 49:59

11M ago49:59

49:59

Summary In this episode of the Data Engineering Podcast Lior Barak shares his insights on developing a three-year strategic vision for data management. He discusses the importance of having a strategic plan for data, highlighting the need for data teams to focus on impact rather than just enablement. He introduces the concept of a "data vision boar…

1
How Orchestration Impacts Data Platform Architecture 59:39

12M ago59:39

59:39

Summary The core task of data engineering is managing the flows of data through an organization. In order to ensure those flows are executing on schedule and without error is the role of the data orchestrator. Which orchestration engine you choose impacts the ways that you architect the rest of your data platform. In this episode Hugo Lu shares his…

1
An Exploration Of The Impediments To Reusable Data Pipelines 51:32

12M ago51:32

51:32

Summary In this episode of the Data Engineering Podcast the inimitable Max Beauchemin talks about reusability in data pipelines. The conversation explores the "write everything twice" problem, where similar pipelines are built without code reuse, and discusses the challenges of managing different SQL dialects and relational databases. Max also touc…

1
The Art of Database Selection and Evolution 59:56

1y ago59:56

59:56

Summary In this episode of the Data Engineering Podcast Sam Kleinman talks about the pivotal role of databases in software engineering. Sam shares his journey into the world of data and discusses the complexities of database selection, highlighting the trade-offs between different database architectures and how these choices affect system design, q…

1
Bridging Code and UI in Data Orchestration with Kestra 44:30

1y ago44:30

44:30

Summary In this episode of the Data Engineering Podcast, Anna Geller talks about the integration of code and UI-driven interfaces for data orchestration. Anna defines data orchestration as automating the coordination of workflow nodes that interact with data across various business functions, discussing how it goes beyond ETL and analytics to enabl…

1
Streaming Data Into The Lakehouse With Iceberg And Trino At Going 39:49

1y ago39:49

39:49

In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going, about the intricacies of streaming data into a Trino and Iceberg lakehouse. Ken shared his journey from product engineering to becoming deeply involved in data-centric roles, highlighting his experiences in ecommerce and InsurTech. At Going, Ken leads th…

1
An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin 56:11

1y ago56:11

56:11

Summary The challenges of integrating all of the tools in the modern data stack has led to a new generation of tools that focus on a fully integrated workflow. At the same time, there have been many approaches to how much of the workflow is driven by code vs. not. Burak Karakan is of the opinion that a fully integrated workflow that is driven entir…

1
Feldera: Bridging Batch and Streaming with Incremental Computation 47:36

1y ago47:36

47:36

Summary In this episode of the Data Engineering Podcast, the creators of Feldera talk about their incremental compute engine designed for continuous computation of data, machine learning, and AI workloads. The discussion covers the concept of incremental computation, the origins of Feldera, and its unique ability to handle both streaming and batch …

1
Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent 48:50

1y ago48:50

48:50

Summary Gleb Mezhanskiy, CEO and co-founder of DataFold, joins Tobias Macey to discuss the challenges and innovations in data migrations. Gleb shares his experiences building and scaling data platforms at companies like Autodesk and Lyft, and how these experiences inspired the creation of DataFold to address data quality issues across teams. He out…

1
Bring Vector Search And Storage To The Data Lake With Lance 58:01

1y ago58:01

58:01

Summary The rapid growth of generative AI applications has prompted a surge of investment in vector databases. While there are numerous engines available now, Lance is designed to integrate with data lake and lakehouse architectures. In this episode Weston Pace explains the inner workings of the Lance format for table definitions and file storage, …

1
The Role of Python in Shaping the Future of Data Platforms with DLT 54:08

1y ago54:08

54:08

Summary In this episode of the Data Engineering Podcast, Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, delve into the principles guiding DLT's development, emphasizing its role as a library rather than a platform, and its integration with lakehouse architectures and AI application frameworks. The episode explores the impact of the P…