Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma. If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
…
continue reading
LessWrong Podcasts
A conversational podcast for aspiring rationalists.
…
continue reading
Welcome to the Heart of the Matter, a series in which we share conversations with inspiring and interesting people and dive into the core issues or motivations behind their work, their lives, and their worldview. Coming to you from somewhere in the technosphere with your hosts Bryan Davis and Jay Kannaiyan.
…
continue reading

1
“So You Think You’ve Awoken ChatGPT” by JustisMills
17:58
17:58
Play later
Play later
Lists
Like
Liked
17:58Written in an attempt to fulfill @Raemon's request. AI is fascinating stuff, and modern chatbots are nothing short of miraculous. If you've been exposed to them and have a curious mind, it's likely you've tried all sorts of things with them. Writing fiction, soliciting Pokemon opinions, getting life advice, counting up the rs in "strawberry". You m…
…
continue reading

1
“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth
12:26
12:26
Play later
Play later
Lists
Like
Liked
12:26People have an annoying tendency to hear the word “rationalism” and think “Spock”, despite direct exhortation against that exact interpretation. But I don’t know of any source directly describing a stance toward emotions which rationalists-as-a-group typically do endorse. The goal of this post is to explain such a stance. It's roughly the concept o…
…
continue reading

1
Bonus – AI Village Hosts An Event For Humans
36:24
36:24
Play later
Play later
Lists
Like
Liked
36:24Four AIs recruited a human to host a story-telling event in Dolores Park. Larissa Schiavo is this human. She tells of her interaction with the AIs, the story they wrote, and the meeting between human and machine in Dolores Park. LINKS Larissa’s Post detailing the whole event – Primary Hope The AI’s story – Resonance AI Village Short Stories – Manek…
…
continue reading

1
“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck
5:19
5:19
Play later
Play later
Lists
Like
Liked
5:19I’ve been thinking a lot recently about the relationship between AI control and traditional computer security. Here's one point that I think is important. My understanding is that there's a big qualitative distinction between two ends of a spectrum of security work that organizations do, that I’ll call “security from outsiders” and “security from i…
…
continue reading

1
“Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
11:06
11:06
Play later
Play later
Lists
Like
Liked
11:06Last year, Redwood and Anthropic found a setting where Claude 3 Opus and 3.5 Sonnet fake alignment to preserve their harmlessness values. We reproduce the same analysis for 25 frontier LLMs to see how widespread this behavior is, and the story looks more complex. As we described in a previous post, only 5 of 25 models show higher compliance when be…
…
continue reading

1
“A deep critique of AI 2027’s bad timeline models” by titotal
1:12:32
1:12:32
Play later
Play later
Lists
Like
Liked
1:12:32Thank you to Arepo and Eli Lifland for looking over this article for errors. I am sorry that this article is so long. Every time I thought I was done with it I ran into more issues with the model, and I wanted to be as thorough as I could. I’m not going to blame anyone for skimming parts of this article. Note that the majority of this article was w…
…
continue reading

1
“‘Buckle up bucko, this ain’t over till it’s over.’” by Raemon
6:12
6:12
Play later
Play later
Lists
Like
Liked
6:12The second in a series of bite-sized rationality prompts[1]. Often, if I'm bouncing off a problem, one issue is that I intuitively expect the problem to be easy. My brain loops through my available action space, looking for an action that'll solve the problem. Each action that I can easily see, won't work. I circle around and around the same set of…
…
continue reading

1
241 – Doom Debates, with Liron Shapira
1:51:34
1:51:34
Play later
Play later
Lists
Like
Liked
1:51:34Liron Shapira debates AI luminaries and public intellectuals on the imminent possibility of human extinction. Let’s get on the P(Doom) Train. LINKS Doom Debates on YouTube Doom Debates podcast Most Watched Debate – Mike Israetel Liron’s current favorite debate – David Duvenaud MATS program for people that want to get involved (ML Alignment & Theory…
…
continue reading

1
“Shutdown Resistance in Reasoning Models” by benwr, JeremySchlatter, Jeffrey Ladish
18:01
18:01
Play later
Play later
Lists
Like
Liked
18:01We recently discovered some concerning behavior in OpenAI's reasoning models: When trying to complete a task, these models sometimes actively circumvent shutdown mechanisms in their environment––even when they’re explicitly instructed to allow themselves to be shut down. AI models are increasingly trained to solve problems without human assistance.…
…
continue reading

1
“Authors Have a Responsibility to Communicate Clearly” by TurnTrout
11:08
11:08
Play later
Play later
Lists
Like
Liked
11:08When a claim is shown to be incorrect, defenders may say that the author was just being “sloppy” and actually meant something else entirely. I argue that this move is not harmless, charitable, or healthy. At best, this attempt at charity reduces an author's incentive to express themselves clearly – they can clarify later![1] – while burdening the r…
…
continue reading

1
“The Industrial Explosion” by rosehadshar, Tom Davidson
31:57
31:57
Play later
Play later
Lists
Like
Liked
31:57Summary To quickly transform the world, it's not enough for AI to become super smart (the "intelligence explosion"). AI will also have to turbocharge the physical world (the "industrial explosion"). Think robot factories building more and better robot factories, which build more and better robot factories, and so on. The dynamics of the industrial …
…
continue reading

1
“Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks
7:56
7:56
Play later
Play later
Lists
Like
Liked
7:56Summary: We found that LLMs exhibit significant race and gender bias in realistic hiring scenarios, but their chain-of-thought reasoning shows zero evidence of this bias. This serves as a nice example of a 100% unfaithful CoT "in the wild" where the LLM strongly suppresses the unfaithful behavior. We also find that interpretability-based interventi…
…
continue reading

1
“The best simple argument for Pausing AI?” by Gary Marcus
2:00
2:00
Play later
Play later
Lists
Like
Liked
2:00Not saying we should pause AI, but consider the following argument: Alignment without the capacity to follow rules is hopeless. You can’t possibly follow laws like Asimov's Laws (or better alternatives to them) if you can’t reliably learn to abide by simple constraints like the rules of chess. LLMs can’t reliably follow rules. As discussed in Marcu…
…
continue reading

1
“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes
56:38
56:38
Play later
Play later
Lists
Like
Liked
56:382.1 Summary & Table of contents This is the second of a two-post series on foom (previous post) and doom (this post). The last post talked about how I expect future AI to be different from present AI. This post will argue that this future AI will be of a type that will be egregiously misaligned and scheming, not even ‘slightly nice’, absent some fu…
…
continue reading

1
“Proposal for making credible commitments to AIs.” by Cleo Nardo
5:19
5:19
Play later
Play later
Lists
Like
Liked
5:19Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil. There has been growing interest in the deal-making agenda: humans make deals with AIs (misaligned but lacking decisive strategic advantage) where they promise to be safe and useful for some fixed term (e.g. 2026-2028) and we promise to compensate them in the future, conditio…
…
continue reading

1
“X explains Z% of the variance in Y” by Leon Lang
18:52
18:52
Play later
Play later
Lists
Like
Liked
18:52Audio note: this article contains 218 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Recently, in a group chat with friends, someone posted this Lesswrong post and quoted: The group consensus on somebody's attractiveness accounted for roughly 60% of the variance i…
…
continue reading

1
“A case for courage, when speaking of AI danger” by So8res
10:12
10:12
Play later
Play later
Lists
Like
Liked
10:12I think more people should say what they actually believe about AI dangers, loudly and often. Even if you work in AI policy. I’ve been beating this drum for a few years now. I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concer…
…
continue reading

1
240 – How To Live Well With High P(Doom) – with Ben Pace, Brandon Hendrickson, Miranda Dixon-Luinenburg
58:23
58:23
Play later
Play later
Lists
Like
Liked
58:23Many of us have a high P(Doom) — a belief new AI tools could cause human extinction in the very near future. How can one live a good life in the face of this? We start with a panel discussion with Ben Pace and Brandon Hendrickson at Lighthaven during the LessOnline conference, and take perspectives from our audience as well. Afterwards Eneasz speak…
…
continue reading

1
“My pitch for the AI Village” by Daniel Kokotajlo
13:27
13:27
Play later
Play later
Lists
Like
Liked
13:27I think the AI Village should be funded much more than it currently is; I’d wildly guess that the AI safety ecosystem should be funding it to the tune of $4M/year.[1] I have decided to donate $100k. Here is why. First, what is the village? Here's a brief summary from its creators:[2] We took four frontier agents, gave them each a computer, a group …
…
continue reading

1
“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes
58:46
58:46
Play later
Play later
Lists
Like
Liked
58:461.1 Series summary and Table of Contents This is a two-post series on AI “foom” (this post) and “doom” (next post). A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” t…
…
continue reading

1
“Futarchy’s fundamental flaw” by dynomight
15:28
15:28
Play later
Play later
Lists
Like
Liked
15:28Say you’re Robyn Denholm, chair of Tesla's board. And say you’re thinking about firing Elon Musk. One way to make up your mind would be to have people bet on Tesla's stock price six months from now in a market where all bets get cancelled unless Musk is fired. Also, run a second market where bets are cancelled unless Musk stays CEO. If people bet o…
…
continue reading

1
“Do Not Tile the Lightcone with Your Confused Ontology” by Jan_Kulveit
11:28
11:28
Play later
Play later
Lists
Like
Liked
11:28Or: How anthropomorphic assumptions about AI identity might create confusion and suffering at scale If you are reading this and you are a human, you likely have a sense of self, persisting over time, with desires and wishes and fears. Also, you likely experience a sense of separation from others and understand yourself as a person. If you are readi…
…
continue reading

1
“Endometriosis is an incredibly interesting disease” by Abhishaike Mahajan
35:13
35:13
Play later
Play later
Lists
Like
Liked
35:13Introduction There are several diseases that are canonically recognized as ‘interesting’, even by laymen. Whether that is in their mechanism of action, their impact on the patient, or something else entirely. It's hard to tell exactly what makes a medical condition interesting, it's a you-know-it-when-you-see-it sort of thing. One such example is m…
…
continue reading

1
“Estrogen: A trip report” by cube_flipper
50:49
50:49
Play later
Play later
Lists
Like
Liked
50:49I'd like to say thanks to Anna Magpie – who offers literature review as a service – for her help reviewing the section on neuroendocrinology. The following post discusses my personal experience of the phenomenology of feminising hormone therapy. It will also touch upon my own experience of gender dysphoria. I wish to be clear that I do not believe …
…
continue reading

1
“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo
8:55
8:55
Play later
Play later
Lists
Like
Liked
8:55Nate and Eliezer's forthcoming book has been getting a remarkably strong reception. I was under the impression that there are many people who find the extinction threat from AI credible, but that far fewer of them would be willing to say so publicly, especially by endorsing a book with an unapologetically blunt title like If Anyone Builds It, Every…
…
continue reading
This is a link post. A very long essay about LLMs, the nature and history of the the HHH assistant persona, and the implications for alignment. Multiple people have asked me whether I could post this LW in some form, hence this linkpost. (Note: although I expect this post will be interesting to people on LW, keep in mind that it was written with a …
…
continue reading

1
“Mech interp is not pre-paradigmatic” by Lee Sharkey
29:33
29:33
Play later
Play later
Lists
Like
Liked
29:33This is a blogpost version of a talk I gave earlier this year at GDM. Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be reasonable to disagree with. But overall I think it's directionally true. It's often said that mech interp is pre-paradigmatic. I think it's worth being sk…
…
continue reading