This collection of documents represents my latest iteration at articulating my opinions about the danger posed by AI. My views haven’t changed too much since the last few iterations of doing this (see AI xrisk).

The change is best described as follows:

I am still about ~90% confident that human extinction will occur fairly soon, but am less confident that it’ll happen by end of 2028. My pr mass for when we get destroyed is now slightly more evenly spread out over the next 10 years. I still think it’s quite plausible by 2028 but wouldn’t be shocked if it took till 2033.

TODO:

I discussed this with Brendan and he gave actually a pretty compelling argument that some kind of “mutually assured destruction” regime (https://www.nationalsecurity.ai/) (MAIM) is semi-plausible. I haven’t thought about it extensively yet, but it seems potentially promising. This is the great part of sharing your ideas frequently! If you’re wrong, then hopefully someone will point it out! Please still send me even more arguments for how we can win --- I don’t think that MAIM is a guaranteed win. But it does genuinely sound like possibly something that could happen.

TODO:

I’ve recently read through some of Paul’s thoughts on why alignment is easy. I think my main disagreement with Paul is on how responsible I expect humanity to be in reacting to the problem. But Paul could be correct their too anyways.

Anyways, this is a current snapshot of my views. Hopefully it is good enough to help me make decisions. I also hope it is of use to some readers.

This document is poorly written in places. I may at some point polish it and use it as a tool to communicate with people / try to build consensus on the issues, or identify experiments that could resolve disagreements. I feel like this post does an okay job of listing a lot of considerations that people have for why risks might not be too bad, and giving rebuttals them.

Acknowledgements:

I’m grateful to Kevin, Tarushii, Brandon Westover, and many others for conversations about AI that have helped me refine my views on this.

Note: I’d love to hear where you disagree with me!


This document presents a dialogue between two imaginary characters: Bob and Alice. Bob is an AI xrisk skeptic who is talking in good faith to his friend Alice who has thought about AI xrisk in depth. Alice will provide arguments for why she believes that AI poses a substantial xrisk, and give rebuttals to Bob’s skeptical claims of why the risk might be fake, or not minimal.

I hope that I have not scarecrowed Bob too much. If you have an argument that Pr(AI xrisk) is small which is not already included in this document, please contact Alek, I’d be happy to include the argument and discuss with you.

Introduction

Alice: Hi Bob.

Bob: Hi Alice.

Alice:

Today I’d like to talk about the danger posed to humanity by the development of superintelligent AI.

Bob:

Oh is that a movie that you were watching recently?

Alice:

No, I mean in real life.

Bob:

Oh hmm, that’s a pretty bold claim. I have several reasons why I’m skeptical right off the bat, before even hearing your argument:

  • Most of the time when someone makes a claim about the world ending, it’s a conspiracy theory.
  • The world feels very continuous --- if there were going to be some drastic change like this, I think I’d have noticed.
  • Computers just do what we tell them to! It’s not like they have their own goals!
  • If an AI is evil, then we’ll just turn it off!
  • Evil AI is a made up science fiction notion.
  • Even if an AI was evil I don’t think it could cause that much harm: there are lots of evil humans, and they don’t cause too much harm.
  • If AI did pose a risk to humanity then people wouldn’t be working on building it.
  • If AI did pose such a risk, then there would be expert consensus about the issue.

So, it’d take some pretty strong reasons and evidence to convince me of this claim. But I think you’re pretty thoughtful, so I’m quite curious to hear you out on why you’re worried.

Alice:

The objections you’ve mentioned are pretty common first impressions to hearing about this issue! I think many of these are reasonable heuristics, which happen to be wrong in this case.

Let’s do the following:

  • I’ll start by explaining why the current trajectory of AI development is so dangerous.
  • Then I’ll respond to your above objections and any new objections that you have after hearing my argument.
  • We then iterate this process until we reach consensus.

Bob: ok, I’m ready.

Alice:

To be clear, here’s the claim I’ll argue for in the rest of the discussion:

Claim 1:

Pr(Humanity goes extinct because of AI in the next 10 years) > 1/2

Alice:

Note that I believe the situation is more dire than this, and that the danger comes sooner than 10 years, but getting more people on board with claim 1 seems like it could improve the situation, and so I’m choosing to focus on this.

My argument is factored into 4 steps, please use the links to navigate to whichever part of the argument you’re most interested in.

LEMMAS

Conclusion

Alice:

The fact that I have 4 lemmas might make the confused reader think that a lot of things (>= 4) have to go wrong in very particular ways in order for catastrophe to occur.

First of all, I have very high confidence in all 4 lemmas, such that if you union bound over the failure probabilities of each lemma, the probability that any of the 4 lemmas fail is very small.

Second, these lemmas are not the whole story. By which I mean, there are lots of ways that the situation can go horribly wrong for humanity that I haven’t even mentioned.

For instance,

  • The tension that arises from world powers competing for dominance by building powerful AI could lead to a nuclear war.
  • Even if AI were aligned to individual users and only ever did what we intended we might still be screwed as a society — there will be intense competitive pressures to replace humans with AI’s, which will result in a world where all resources are controlled by AIs, and once humans have no power, it seems unlikely that they will stay around for long (like animals). (See Krueger’s paper about this.)
  • If an aligned superintelligence and a misaligned superintelligence are built around the same time, then it’s not clear that the aligned superintelligence can protect us—destruction is easier than preventing destruction.

In many cases I’ve said “here’s the most likely ways that AI’s will be built, and why that would be bad”. But, this is a hard problem. Making minor tweaks to the design of the AI so that it doesn’t match my particular story won’t necessarily fix the problem. We must guard against unwarranted hope, or we will be tricked into thinking that our solution is safe when it is not, and then we will die.

The situation is robustly bad.

But, the fight for the future is not over yet.

I don’t even think we’re past the point of no return.

Please don’t despair.

Please don’t make the situation worse by working on AI capabilities.

Please try to make the situation better.

If you don’t, who do you think will?