Welcome to The Flux - where we talk data, decisions, and stories of people asking the what-if questions to create an intentional impact on the future.
…
continue reading
Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.
…
continue reading

1
Mining Meaning: Laura Sheppard on Gender, Academia, and the Power of Public Data
39:40
39:40
Play later
Play later
Lists
Like
Liked
39:40In this episode of The Flux, we talk with Laura Sheppard, a research fellow at University College London’s Centre for Longitudinal Studies, about how data mining can uncover powerful insights from unexpected sources. Laura shares her work using the British Library’s Ethos dataset, a comprehensive record of UK doctoral theses - to explore gender ine…
…
continue reading

1
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection
27:19
27:19
Play later
Play later
Lists
Like
Liked
27:19For this week's paper read, we actually dive into our own research. We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a …
…
continue reading

1
From Micro-Behaviors to Macro-Patterns: Exploring Agent-Based Models with Andrew Crooks
30:12
30:12
Play later
Play later
Lists
Like
Liked
30:12In this episode of The Flux, host John Cordier sits down with Andrew Crooks at the Complex Social Systems Society Conference in Santa Fe. They dive into the world of agent-based modeling (ABM) - what it is, why it matters, and how it helps us simulate and better understand human behavior in complex systems. From simulating traffic jams to modeling …
…
continue reading

1
AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam
26:11
26:11
Play later
Play later
Lists
Like
Liked
26:11This week we talk about modern AI benchmarks, taking a close look at Google's recent Gemini 2.5 release and its performance on key evaluations, notably Humanity's Last Exam (HLE). In the session we covered Gemini 2.5's architecture, its advancements in reasoning and multimodality, and its impressive context window. We also talked about how benchmar…
…
continue reading
We cover Anthropic’s groundbreaking Model Context Protocol (MCP). Though it was released in November 2024, we've been seeing a lot of hype around it lately, and thought it was well worth digging into. Learn how this open standard is revolutionizing AI by enabling seamless integration between LLMs and external data sources, fundamentally transformin…
…
continue reading

1
From Neural Networks to Synthetic Populations: Hamdi Kavak’s Journey
25:59
25:59
Play later
Play later
Lists
Like
Liked
25:59What if we could predict the impact of a pandemic, a policy change, or even a war before it happens? Hamdi Kavak joins The Flux to discuss how agent-based modeling enables researchers to explore "what if" scenarios and simulate real-world behaviors at both small and massive scales. We dive into the applications of ABM in government-funded projects,…
…
continue reading

1
From Curiosity to Complexity: Paul Amoruso’s Journey into Agent-Based Modeling
18:39
18:39
Play later
Play later
Lists
Like
Liked
18:39In this episode of The Flux, host John Cordier, CEO of Epistemix, sits down with Paul Amoruso to explore his unexpected journey into the world of agent-based modeling (ABM). What started as a course recommendation turned into a passion for understanding complex systems and using data to drive better decisions. Paul shares insights from his academic…
…
continue reading

1
AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs
30:23
30:23
Play later
Play later
Lists
Like
Liked
30:23This week, we're mixing things up a little bit. Instead of diving deep into a single research paper, we cover the biggest AI developments from the past few weeks. We break down key announcements, including: DeepSeek’s Big Launch Week: A look at FlashMLA (DeepSeek’s new approach to efficient inference) and DeepEP (their enhanced pretraining method).…
…
continue reading

1
How DeepSeek is Pushing the Boundaries of AI Development
29:54
29:54
Play later
Play later
Lists
Like
Liked
29:54This week, we dive into DeepSeek. SallyAnn DeLucia, Product Manager at Arize, and Nick Luzio, a Solutions Engineer, break down key insights on a model that have dominating headlines for its significant breakthrough in inference speed over other models. What’s next for AI (and open source)? From training strategies to real-world performance, here’s …
…
continue reading

1
Complexity in Government Contracting: Jim Malone's Journey into Computational Social Science
18:21
18:21
Play later
Play later
Lists
Like
Liked
18:21In this episode of The Flux, host John Cordier interviews Jim Malone at the Complex Social Systems Conference in Santa Fe. Jim, an experienced government acquisition professional, shares his fascinating journey from contracting to pursuing a PhD in Computational Social Science. He discusses the importance of understanding complexity in systems, par…
…
continue reading

1
Multiagent Finetuning: A Conversation with Researcher Yilun Du
30:03
30:03
Play later
Play later
Lists
Like
Liked
30:03We talk to Google DeepMind Senior Research Scientist (and incoming Assistant Professor at Harvard), Yilun Du, about his latest paper "Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains." This paper introduces a multiagent finetuning framework that enhances the performance and diversity of language models by employing a society of…
…
continue reading

1
Decision-Making in Complex Systems: Insights from Agent-Based Modeling with Aaron Frank
38:48
38:48
Play later
Play later
Lists
Like
Liked
38:48Welcome to The Flux! In this episode, host John Cordier, CEO at Epistemix, explores the intricacies of agent-based modeling and its impact on decision-making with Aaron Frank. Recorded live at the Complex Social System Society of America's conference in Santa Fe, New Mexico, Aaron shares his journey from traditional national security research to co…
…
continue reading

1
Training Large Language Models to Reason in Continuous Latent Space
24:58
24:58
Play later
Play later
Lists
Like
Liked
24:58LLMs have typically been restricted to reason in the "language space," where chain-of-thought (CoT) is used to solve complex reasoning problems. But a new paper argues that language space may not always be the best for reasoning. In this paper read, we cover an exciting new technique from a team at Meta called Chain of Continuous Thought—also known…
…
continue reading

1
Democracy 3.0: Bridging Policy and Technology with Tom Pike
24:04
24:04
Play later
Play later
Lists
Like
Liked
24:04In this episode of The Flux, hosted by John Cordier, CEO at Epistemix, we delve into the fascinating world of agent-based modeling (ABM) with Tom Pike, co-lead for MESA. Recorded live at the Complex Systems Society conference in Santa Fe, New Mexico, Tom shares insights on democratizing ABM through Python, its significance in decision-making, and i…
…
continue reading

1
Randy Burgh: Insights from Economic Development and Complexity Economics
37:40
37:40
Play later
Play later
Lists
Like
Liked
37:40This episode explores the economic development potential of technology developed in New Mexico, focusing on industry-based research and the collaboration between companies, universities, and government labs. It delves into the evolution of technology, highlighting key insights from Dr. Brown of the Santa Fe Institute, whose work integrates the conc…
…
continue reading

1
LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods
28:57
28:57
Play later
Play later
Lists
Like
Liked
28:57We discuss a major survey of work and research on LLM-as-Judge from the last few years. "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods" systematically examines the LLMs-as-Judge framework across five dimensions: functionality, methodology, applications, meta-evaluation, and limitations. This survey gives us a birds eye view…
…
continue reading

1
Navigating Complexity: Timothy Clancy on Modeling and Policy-Making
32:58
32:58
Play later
Play later
Lists
Like
Liked
32:58In this episode of The Flux, hosted by John Cordier, CEO of Epistemix, we hear from Timothy Clancy, a researcher from the University of Maryland's START program. Recorded at the Complex Social Systems Society of the Americas Conference in Santa Fe, New Mexico, Clancy shares insights from his career, which spans government work and applied studies o…
…
continue reading

1
Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies
28:47
28:47
Play later
Play later
Lists
Like
Liked
28:47LLMs have revolutionized natural language processing, showcasing remarkable versatility and capabilities. But individual LLMs often exhibit distinct strengths and weaknesses, influenced by differences in their training corpora. This diversity poses a challenge: how can we maximize the efficiency and utility of LLMs? A new paper, "Merge, Ensemble, a…
…
continue reading

1
Exploring Agent-Based Modeling for Complex Policy Challenges with Sherwin Brown
10:36
10:36
Play later
Play later
Lists
Like
Liked
10:36In this episode of The Flux, host John Cordier, CEO at Epistemix, interviews Sherwin Brown from MITRE. Sherwin shares his journey into agent-based modeling, starting from his background in health management and policy. He discusses the role of data science and analytics in solving complex problems, such as monitoring the impact of the Affordable Ca…
…
continue reading

1
Agent-as-a-Judge: Evaluate Agents with Agents
24:54
24:54
Play later
Play later
Lists
Like
Liked
24:54This week, we break down the “Agent-as-a-Judge” framework—a new agent evaluation paradigm that’s kind of like getting robots to grade each other’s homework. Where typical evaluation methods focus solely on outcomes or demand extensive manual work, this approach uses agent systems to evaluate agent systems, offering intermediate feedback throughout …
…
continue reading

1
Exploring Data-Driven Decision Making with Bill Rand
40:22
40:22
Play later
Play later
Lists
Like
Liked
40:22In this episode of 'The Flux,' we dive deep into data-driven decision making with Bill Rand from NC State University. Join us as Bill discusses his work in agent-based modeling, social media misinformation, and various innovative projects. Discover how agent-based modeling can provide insights into human behavior, business analytics, and even inter…
…
continue reading
We break down OpenAI’s realtime API. Learn how to seamlessly integrate powerful language models into your applications for instant, context-aware responses that drive user engagement. Whether you’re building chatbots, dynamic content tools, or enhancing real-time collaboration, we walk through the API’s capabilities, potential use cases, and best p…
…
continue reading

1
Revolutionizing Engineering: The Power of AI and Simulation with Dave Freed
39:32
39:32
Play later
Play later
Lists
Like
Liked
39:32In this episode of The Flux, join Dave Freed, Senior Director at Ansys, as he delves into the transformative world of computer simulations. With a rich history from Exa Corporation to OnScale and Ansys, Dave explores the evolution and future of simulations, highlighting their pivotal role in automotive, aerospace, and nuclear engineering. Discover …
…
continue reading

1
Swarm: OpenAI's Experimental Approach to Multi-Agent Systems
46:46
46:46
Play later
Play later
Lists
Like
Liked
46:46As multi-agent systems grow in importance for fields ranging from customer support to autonomous decision-making, OpenAI has introduced Swarm, an experimental framework that simplifies the process of building and managing these systems. Swarm, a lightweight Python library, is designed for educational purposes, stripping away complex abstractions to…
…
continue reading
In this episode, we dive into the intriguing mechanics behind why chat experiences with models like GPT often start slow but then rapidly pick up speed. The key? The KV cache. This essential but under-discussed component enables the seamless and snappy interactions we expect from modern AI systems. Harrison Chu breaks down how the KV cache works, h…
…
continue reading

1
Transforming Data into Decisions: Insights from Matt Madden at BYU
44:47
44:47
Play later
Play later
Lists
Like
Liked
44:47In this episode of The Flux, host John Cordier interviews Matt Madden, Director of the BYU Marketing Lab, about his work in making complex statistics and marketing analytics accessible. Madden discusses the lab's unique approach, which allows students to apply their skills in real-world consulting projects. They delve into key topics like market re…
…
continue reading

1
The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs
3:31
3:31
Play later
Play later
Lists
Like
Liked
3:31In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler. This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transforming LLMs. Harrison talks about how this method improves upon traditional sampling strategies by leveraging entropy and varentropy to produce more dynam…
…
continue reading

1
Google's NotebookLM and the Future of AI-Generated Audio
43:28
43:28
Play later
Play later
Lists
Like
Liked
43:28This week, Aman Khan and Harrison Chu explore NotebookLM’s unique features, including its ability to generate realistic-sounding podcast episodes from text (but this podcast is very real!). They dive into some technical underpinnings of the product, specifically the SoundStorm model used for generating high-quality audio, and how it leverages a hie…
…
continue reading

1
Navigating Complex Systems with Don Burke: Epidemiology, AI, and Modeling
44:31
44:31
Play later
Play later
Lists
Like
Liked
44:31In this episode of The Flux, host John Cordier sits down with Don Burke, co-founder of Epistemix and a trailblazing epidemiologist, to explore the fascinating intersection of infectious disease research, artificial intelligence, and agent-based modeling (ABM). Burke shares his journey from a traditional career in infectious disease research to beco…
…
continue reading

1
The Tipping Point for Agent-Based Modeling with Rob Axtell
40:00
40:00
Play later
Play later
Lists
Like
Liked
40:00In this episode of The Flux, John Cordier interviews Rob Axtell from George Mason University, where he leads the largest graduate program in agent-based modeling (ABM) globally. Axtell shares his journey into complex systems modeling and how the field has evolved since the 1990s. He explains how George Mason’s Ph.D. program in Computational Social …
…
continue reading

1
Exploring OpenAI's o1-preview and o1-mini
42:02
42:02
Play later
Play later
Lists
Like
Liked
42:02OpenAI recently released its o1-preview, which they claim outperforms GPT-4o on a number of benchmarks. These models are designed to think more before answering and handle complex tasks better than their other models, especially science and math questions. We take a closer look at their latest crop of o1 models, and we also highlight some research …
…
continue reading

1
Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning
26:54
26:54
Play later
Play later
Lists
Like
Liked
26:54A recent announcement on X boasted a tuned model with pretty outstanding performance, and claimed these results were achieved through Reflection Tuning. However, people were unable to reproduce the results. We dive into some recent drama in the AI community as a jumping off point for a discussion about Reflection 70B. In 2023, there was a paper wri…
…
continue reading

1
The Intersection of Science Fiction and Reality: A Conversation with Sam Arbesman
37:10
37:10
Play later
Play later
Lists
Like
Liked
37:10In The Intersection of Science Fiction and Reality episode of The Flux podcast, host John Cordier engages in a conversation with Sam Arbesman, Scientist-in-Residence at Lux Capital and Research Fellow at the Long Now Foundation. They explore how science fiction, video games, and computational social science intersect to influence real-world innovat…
…
continue reading

1
Composable Interventions for Language Models
42:35
42:35
Play later
Play later
Lists
Like
Liked
42:35This week, we're excited to be joined by Kyle O'Brien, Applied Scientist at Microsoft, to discuss his most recent paper, Composable Interventions for Language Models. Kyle and his team present a new framework, composable interventions, that allows for the study of multiple interventions applied sequentially to the same language model. The discussio…
…
continue reading

1
The Growing Impact of Agent-Based Modeling with Matt Kohler
32:21
32:21
Play later
Play later
Lists
Like
Liked
32:21In this episode of The Flux, John Cordier interviews Matt Kohler, Applied Complexity Scientist at MITRE and President of the Computational Social Science Society of the Americas, about the transformative power of agent-based modeling (ABM). Kohler explains how ABM simulates complex human systems and helps decision-makers understand the ripple effec…
…
continue reading

1
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
39:05
39:05
Play later
Play later
Lists
Like
Liked
39:05This week’s paper presents a comprehensive study of the performance of various LLMs acting as judges. The researchers leverage TriviaQA as a benchmark for assessing objective knowledge reasoning of LLMs and evaluate them alongside human annotations which they find to have a high inter-annotator agreement. The study includes nine judge models and ni…
…
continue reading

1
The Future of Agent-Based Modeling: Insights from Josh Epstein
39:05
39:05
Play later
Play later
Lists
Like
Liked
39:05In the inaugural episode of The Flux, John Cordier, CEO of Epistemix, interviews Josh Epstein, Director of the Agent-Based Modeling Lab at NYU and a prominent figure at the Santa Fe Institute. The discussion revolves around the potential and progress of agent-based modeling (ABM), particularly in public health, economics, and beyond. Epstein shares…
…
continue reading

1
Breaking Down Meta's Llama 3 Herd of Models
44:40
44:40
Play later
Play later
Lists
Like
Liked
44:40Meta just released Llama 3.1 405B–according to them, it’s “the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation.” Will the latest Llama herd ignite new applications and modeling paradigms like synthetic data gene…
…
continue reading

1
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
33:57
33:57
Play later
Play later
Lists
Like
Liked
33:57Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic “prompt engineering.” The paper this week introduces LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. The researchers integrated the…
…
continue reading

1
RAFT: Adapting Language Model to Domain Specific RAG
44:01
44:01
Play later
Play later
Lists
Like
Liked
44:01Where adapting LLMs to specialized domains is essential (e.g., recent news, enterprise private documents), we discuss a paper that asks how we adapt pre-trained LLMs for RAG in specialized domains. SallyAnn DeLucia is joined by Sai Kolasani, researcher at UC Berkeley’s RISE Lab (and Arize AI Intern), to talk about his work on RAFT: Adapting Languag…
…
continue reading

1
LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic
44:00
44:00
Play later
Play later
Lists
Like
Liked
44:00It’s been an exciting couple weeks for GenAI! Join us as we discuss the latest research from OpenAI and Anthropic. We’re excited to chat about this significant step forward in understanding how LLMs work and the implications it has for deeper understanding of the neural activity of language models. We take a closer look at some recent research from…
…
continue reading

1
Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment
48:07
48:07
Play later
Play later
Lists
Like
Liked
48:07We break down the paper--Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment. Ensuring alignment (aka: making models behave in accordance with human intentions) has become a critical task before deploying LLMs in real-world applications. However, a major challenge faced by practitioners is the lack of clear guid…
…
continue reading

1
Breaking Down EvalGen: Who Validates the Validators?
44:47
44:47
Play later
Play later
Lists
Like
Liked
44:47Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation. This week’s paper explores EvalGen, a m…
…
continue reading

1
Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models
45:07
45:07
Play later
Play later
Lists
Like
Liked
45:07This week we explore ReAct, an approach that enhances the reasoning and decision-making capabilities of LLMs by combining step-by-step reasoning with the ability to take actions and gather information from external sources in a unified framework. Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest o…
…
continue reading

1
Demystifying Chronos: Learning the Language of Time Series
44:40
44:40
Play later
Play later
Lists
Like
Liked
44:40This week, we’ve covering Amazon’s time series model: Chronos. Developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model customization. Chronos however, is built on a language model architecture and trained with billions of tokenized time series observations, enabling it t…
…
continue reading
This week we dive into the latest buzz in the AI world – the arrival of Claude 3. Claude 3 is the newest family of models in the LLM space, and Opus Claude 3 ( Anthropic's "most intelligent" Claude model ) challenges the likes of GPT-4. The Claude 3 family of models, according to Anthropic "sets new industry benchmarks," and includes "three state-o…
…
continue reading

1
Reinforcement Learning in the Era of LLMs
44:49
44:49
Play later
Play later
Lists
Like
Liked
44:49We’re exploring Reinforcement Learning in the Era of LLMs this week with Claire Longo, Arize’s Head of Customer Success. Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest…
…
continue reading

1
Sora: OpenAI’s Text-to-Video Generation Model
45:08
45:08
Play later
Play later
Lists
Like
Liked
45:08This week, we discuss the implications of Text-to-Video Generation and speculate as to the possibilities (and limitations) of this incredible technology with some hot takes. Dat Ngo, ML Solutions Engineer at Arize, is joined by community member and AI Engineer Vibhu Sapra to review OpenAI’s technical report on their Text-To-Video Generation Model: …
…
continue reading
This week, we’re discussing "RAG vs Fine-Tuning: Pipelines, Tradeoff, and a Case Study on Agriculture." This paper explores a pipeline for fine-tuning and RAG, and presents the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. The authors propose a pipeline that consists of multiple stages, including extracting …
…
continue reading

1
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels
36:22
36:22
Play later
Play later
Lists
Like
Liked
36:22We discuss HyDE: a thrilling zero-shot learning technique that combines GPT-3’s language understanding with contrastive text encoders. HyDE revolutionizes information retrieval and grounding in real-world data by generating hypothetical documents from queries and retrieving similar real-world documents. It outperforms traditional unsupervised retri…
…
continue reading