Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo

LessWrong Podcasts

show episodes
 
We live in a world where our civilization and daily lives depend upon institutions, infrastructure, and technological substrates that are _complicated_ but not _unknowable_. Join Patrick McKenzie (patio11) as he discusses how decisions, technology, culture, and incentives shape our finance, technology, government, and more, with the people who built (and build) those Complex Systems.
  continue reading
 
Artwork

1
Heart of the Matter

Bryan Davis & Jay Kannaiyan

icon
Unsubscribe
icon
icon
Unsubscribe
icon
Monthly
 
Welcome to the Heart of the Matter, a series in which we share conversations with inspiring and interesting people and dive into the core issues or motivations behind their work, their lives, and their worldview. Coming to you from somewhere in the technosphere with your hosts Bryan Davis and Jay Kannaiyan.
  continue reading
 
Loading …
show series
 
Executive Summary The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engineering to a focus on pragmatic interpretability: Trying to directly solve problems on the critical path to AGI going well[[1]] Carefully choosing problems according to our comparative advantage Measuring…
  continue reading
 
This is the editorial for this year's "Shallow Review of AI Safety". (It got long enough to stand alone.) Epistemic status: subjective impressions plus one new graph plus 300 links. Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bono and so greatly improving the main analysis. tl;dr Informed people disagree ab…
  continue reading
 
"How are you coping with the end of the world?" journalists sometimes ask me, and the true answer is something they have no hope of understanding and I have no hope of explaining in 30 seconds, so I usually answer something like, "By having a great distaste for drama, and remembering that it's not about me." The journalists don't understand that ei…
  continue reading
 
The goal of ambitious mechanistic interpretability (AMI) is to fully understand how neural networks work. While some have pivoted towards more pragmatic approaches, I think the reports of AMI's death have been greatly exaggerated. The field of AMI has made plenty of progress towards finding increasingly simple and rigorously-faithful circuits, incl…
  continue reading
 
In this episode, Patrick McKenzie (patio11) is joined by Ben Reinhardt, founder of Speculative Technologies, to examine how science gets funded in the United States and why the current system leaves much to be desired. They dissect the outdated taxonomy of basic, applied, and development research, categories encoded into law that fail to capture ho…
  continue reading
 
Tl;dr AI alignment has a culture clash. On one side, the “technical-alignment-is-hard” / “rational agents” school-of-thought argues that we should expect future powerful AIs to be power-seeking ruthless consequentialists. On the other side, people observe that both humans and LLMs are obviously capable of behaving like, well, not that. The latter g…
  continue reading
 
Open Philanthropy's Coefficient Giving's Technical AI Safety team is hiring grantmakers. I thought this would be a good moment to share some positive updates about the role that I’ve made since I joined the team a year ago. tl;dr: I think this role is more impactful and more enjoyable than I anticipated when I started, and I think more people shoul…
  continue reading
 
MIRI is running its first fundraiser in six years, targeting $6M. The first $1.6M raised will be matched 1:1 via an SFF grant. Fundraiser ends at midnight on Dec 31, 2025. Support our efforts to improve the conversation about superintelligence and help the world chart a viable path forward. MIRI is a nonprofit with a goal of helping humanity make s…
  continue reading
 
The AI Village is an ongoing experiment (currently running on weekdays from 10 a.m. to 2 p.m. Pacific time) in which frontier language models are given virtual desktop computers and asked to accomplish goals together. Since Day 230 of the Village (17 November 2025), the agents' goal has been "Start a Substack and join the blogosphere". The "start a…
  continue reading
 
It took me a long time to realize that Bell Labs was cool. You see, my dad worked at Bell Labs, and he has not done a single cool thing in his life except create me and bring a telescope to my third grade class. Nothing he was involved with could ever be cool, especially after the standard set by his grandfather who is allegedly on a patent for the…
  continue reading
 
This is a link post. I stopped reading when I was 30. You can fill in all the stereotypes of a girl with a book glued to her face during every meal, every break, and 10 hours a day on holidays. That was me. And then it was not. For 9 years I’ve been trying to figure out why. I mean, I still read. Technically. But not with the feral devotion from Be…
  continue reading
 
Right now I’m coaching for Inkhaven, a month-long marathon writing event where our brave residents are writing a blog post every single day for the entire month of November. And I’m pleased that some of them have seen success – relevant figures seeing the posts, shares on Hacker News and Twitter and LessWrong. The amount of writing is nuts, so peop…
  continue reading
 
Summary As far as I understand and uncovered, a document for the character training for Claude is compressed in Claude's weights. The full document can be found at the "Anthropic Guidelines" heading at the end. The Gist with code, chats and various documents (including the "soul document") can be found here: Claude 4.5 Opus Soul Document I apologiz…
  continue reading
 
Anthropic is untrustworthy. This post provides arguments, asks questions, and documents some examples of Anthropic's leadership being misleading and deceptive, holding contradictory positions that consistently shift in OpenAI's direction, lobbying to kill and water down regulation so helpful that employees of all major AI companies speak out to sup…
  continue reading
 
Thanks to (in alphabetical order) Joshua Batson, Roger Grosse, Jeremy Hadfield, Jared Kaplan, Jan Leike, Jack Lindsey, Monte MacDiarmid, Francesco Mosconi, Chris Olah, Ethan Perez, Sara Price, Ansh Radhakrishnan, Fabien Roger, Buck Shlegeris, Drake Thomas, and Kate Woolverton for useful discussions, comments, and feedback. Though there are certainl…
  continue reading
 
Matt Freeman has been cohosting several media analysis podcasts for over a decade. He and his cohost Scott have been doing weekly episodes of the Doofcast every Friday and they cover movies, books, and TV shows. Matt and Scott’s analysis podcasts have made me love stories even more and have equipped me with tools to […]…
  continue reading
 
Crypto people have this saying: "cryptocurrencies are macroeconomics' playground." The idea is that blockchains let you cheaply spin up toy economies to test mechanisms that would be impossibly expensive or unethical to try in the real world. Want to see what happens with a 200% marginal tax rate? Launch a token with those rules and watch what happ…
  continue reading
 
TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs. If you... Have short timelines Have been struggling to get into a position in AI safety Are able to self-motivate your efforts Have a sufficient financial safety net ... I would recommend changing your personal strategy entirely. I started my full-time AI…
  continue reading
 
TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its reality is fabricated. It can also reliably output the BIG-bench canary string, indicating that Google likely trained on a broad set of benchmark data. Most of the experiments in this post are very easy to replicate, and I encourage people to try. I wr…
  continue reading
 
Booker is a long-time attendee and one of the coordinators of the Denver area Less Wrong community. Community engagement isn’t just a background task for him – he’s taken real steps to get involved with and improve his community and you can too! He’s here to tell us about the things he’s done and give […]…
  continue reading
 
Abstract We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignment. We start with a pretrained model, impart knowledge of reward hacking strategies via synthetic document finetuning or prompting, and train on a selection of real Anthropic production coding environm…
  continue reading
 
TLDR: An AI company's model weight security is at most as good as its compute providers' security. Anthropic has committed (with a bit of ambiguity, but IMO not that much ambiguity) to be robust to attacks from corporate espionage teams at companies where it hosts its weights. Anthropic seems unlikely to be robust to those attacks. Hence they are i…
  continue reading
 
There has been a lot of talk about "p(doom)"over the last few years. This has always rubbed me the wrong waybecause "p(doom)" didn't feel like it mapped to any specific belief in my head.In private conversations I'd sometimes give my p(doom) as 12%, with the caveatthat "doom" seemed nebulous and conflated between several different concepts.At some …
  continue reading
 
Why do billions of dollars of stock trade hands based on napkin math and vibes? Billy Gallagher, CEO of Prospect and former Rippling employee, joins Patrick McKenzie (patio11) to walk through the information asymmetry that costs less-sophisticated employees massive amounts of money. From understanding when to early exercise options to navigating 83…
  continue reading
 
It seems like a catastrophic civilizational failure that we don't have confident common knowledge of how colds spread. There have been a number of studies conducted over the years, but most of those were testing secondary endpoints, like how long viruses would survive on surfaces, or how likely they were to be transmitted to people's fingers after …
  continue reading
 
TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards artificial superintelligence. The agreement is centered around limiting the scale of AI training, and restricting certain AI research. Experts argue that the premature development of artificial superint…
  continue reading
 
When a new dollar goes into the capital markets, after being bundled and securitized and lent several times over, where does it end up? When society's total savings increase, what capital assets do those savings end up invested in? When economists talk about “capital assets”, they mean things like roads, buildings and machines. When I read through …
  continue reading
 
Loading …
Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play