Thinking about people's reactions to AI xrisk

How would you like to die? Well, really I’d rather not. Okay, so it’s a sore subject. Maybe instead let me ask, how would you like to live? This seems like a better question for several reasons. For instance, it seems more actionable. If I am to die, I don’t think assigning lots of extra weight to that moment is good. If anything that moment should have the least weight, because I’ll never remember it.

(Note — this post might come across as cynical or critical or not nice. Sorry, I didn’t really mean it that way! Consider reading something lighter like cooking instead. Also — remark: is “cooking” the only “non-heavy” blog post I’ve ever written on skyspace3? lol. good for me I guess.

Also consider reading the ending of the blog post — I basically overturn a lot of the cognitions in the post as irrational.)

Anyways, I’ve thought about this a bit and have a few answers:

I’d like to live with great social interactions.
I’d like to live trying to make the universe a better place — e.g., by doing technical alignment research.
I’d like to live “agentically” / intentionally — for instance, not wasting time on silly technology.
I’d like to live confidently / without much self-doubt.
I’d like to live compassionately — with love towards myself and others.
I’d like to continue to be thoughtful and recursively self-improve, e.g., by writing this blog!

These aren’t very specific of course, but I enjoy when people keep me accountable / help me withlife improvement, so it’s always a good conversation to ask me “how is trying to make the universe a better place, or socializing etc going?”

Anyways, this post is a record of me debugging the final item on my above list. Which as always I’m making public because I enjoy sharing my thoughts, and supposedly people enjoy reading ramblings of Alek.

Okay, so here’s the problem:

Recently I’ve talked to a fairly large number of people (probably including you if you’re a reader of this blog, and if not go read AI xrisk ok now I have talked to you about this kind of). And I’ve been generally upset about people’s responses.

People’s responses generally are like: (apologizes if you’ve been caricatured below — the point of this post — which upon reflection it’s really not clear why I’m publishing but whatever — is to first understand my thoughts, then understand why they’re wrong, and then fix them)

Nah, AI won’t get good, or at least not soon enough for it to be my problem.
AI might get good (although slower than you think), but it won’t want things, or it’ll just want nice things, or it won’t do actions to accomplish its goals, or we’ll turn the AI off if we realize it’s evil and we’ll def realize it’s evil in time to respond if it actually is evil.
AI might get good, and also might be evil, but it’s fine, there are evil people and they don’t take over the world.
No one’s talking about AI being a big risk, so probably it’s not really.
People wouldn’t really build AI if it’s so dangerous.
Maybe alignment is just easy.
Oh shoot that sounds really bad, and it’s so great that you’re going to do safety work and fix it!
Oh that’s so cool!
Eh, I can take 10% odds of extinction, that’s not so bad.
Oh, hmm well that sounds really bad, I guess we’re screwed.

How I would like people to respond, I guess:

Oh hmm, if you’re right about this, then this is like the biggest deal ever, and we really need to do something about it.
Let me try to repeat your argument back to you, and see if I got it.
Okay, I’m unsure about this part, could you tell me why you believe this part?
Okay, I’m going to read about this some more, and try to think through the arguments myself, make sure there isn’t anything missing.
(1 week later) — crap this is really really really bad!
What should I do about it?
- Should I be contacting my political representatives and talking about it? The news people?
- Raising awareness in my circle of influence (and trying to expand this circle)?
- Donating to the Long Term Future Foundation?
- Doing technical AI safety work (e.g., AI control, or evals)?

In an earlier blog post my proposed strategy for fixing this was something like “yell WE’RE ALL GOING TO DIE we need to DO SOMETHING to STOP THIS, but louder”. There’s some merit to that approach: trying to be understood is better than just giving up on being understood. As is probably mentioned somewhere, understanding and feeling understood are some of the most awesome parts of connecting with other intelligences. But, I think this approach is slightly problematic because it consists of me desiring to be understood without trying to understand other people. And, as it’s likely that most people will not understand me in this way, it’d also be healthy to learn how to be at peace with that, and not have that be a mental burden. Not have that be a wedge between me and friends. Not have that become a regret. (Remark — avoiding having regrets seems like a really good idea, especially if life is short).

So, we’ll break it into two parts:

Understanding other people.
Finding peace without understanding.

Understanding other people

What goes on inside people’s brains when I say “AI’s are going to kill all humans before 2030”?

Connor Leahy claims that when someone hears a claim like this, a couple of heuristics activate in their brain:

“big things don’t happen"
"computers don’t do stuff"
"not-widely believed ideas are false"
"people talking about extinction risk are crazy"
"AI takeover happens in sci-fi, thus not in real life”

Some of these are pretty reasonable heuristics that apply to of cases. So it’s natural that people would employ these heuristics. Previously when discussing risk I would try to pin down the disagreement by being like “well, do you not buy AI’s will get buff, or that AI’s will be misaligned, or that buff misaligned AI’s will be able to kill all humans?“. This approach seems generally not super effective, because people are really saying that they get bad vibes from the argument.

This is pretty valid, and relates to what Terry Tao calls “global” vs “local” errors in proofs. A local error is one step of the argument being wrong. A global error is that the conclusion is false — you just have a counter-example. Often when doing math with a math “novice” (e.g., a tutoring client), I’ll be like “look --- your claim is false, here is a counter-example.” And they’ll be like “but all my steps were right! If the conclusion is wrong then you should be able to find which step was wrong.” I like global errors much more than local ones. Because with local ones you can always hedge that it can be patched or something. Anyways the relation to talking to people is that people feel like the AI risk claim has a global error, based on bad vibes, so me showing that it locally check isn’t satisfying to people. Showing why these heuristics are false might be a better approach.

Understanding the problem some more

Okay, so fine maybe I have discovered a better algorithm for talking to people about risk. However, empirically I’ve noticed the following pretty concerning thing: it seems like very few people understand when I talk about xrisk.

My use of the word “understand” here is a bit non-standard (thanks to N for pointing this out). I guess there are really 3 types of things that you could mean by “understand”:

Understands the implications:
Death is bad. Human extinction is much worse. If someone you think is generally a reasonably smart/thoughtful person tells you humanity has a chance of being wiped out in a few years, I think it’d be a pretty good idea to investigate why.
A lot of times I’ll talk to someone bluntly about my predictions for the “future”. And they’ll be like “oh yeah, I think deepfakes are a big issue, and also ChatGPT is going to help people cheat in school!”
I think phrases like “human extinction” are anti-memes --- the idea is slippery, or classified as abstract philosophy — so people can nod along and be like, yeah for real that sucks. Instead of being like “wow that would be really really bad if that were true, let’s discuss the argument”.
In case this is you: I’m not talking about robots taking our jobs, or increasing fake news, or whatever.
I’m talking about the following simple predicate on the state of the universe evaluating to true: NUM(humans)=0.
Buying the argument
The risk argument is based on 1 empirical fact (rate of progress is absurdly fast), 2 basic facts about ML (inner + outer alignment are hard), and 1 obvious mathematical claim (if a smart entity wants to kill you it’ll figure out how to do it).
The argument is pretty unambiguous. But it’s a bit hard to lay out because people like to focus on local details rather than the big trajectory. If my story about how the US put the AI’s in charge of autonomous weapons and then the AI’s staged a coup has some holes, they’ll forget that an ASI would be better at military strategy than me.
Doing something about it!
If you buy the argument and understand the implications. Then you should freaking do something about this. For instance, you should convince everyone in your circle of influence that this is the biggest issue ever. You should become informed and calibrated. You should consider trying to pitch in professionally to help. Some common ways to do this include (1) doing AI safety research (NOTE: please be so so so careful that you don’t accidentally just end up improving capabilities under the guise of doing safety work.) (2) doing AI policy or communication work --- this seems really good.
I’ll award partial points for just literally doing anything abnormal and plausibly helpful though.

I only feel like a handful (maybe 1-3) of people that I’ve “introduced the issue to” (by which I mean, I was the first person that they knew personally that takes the issue seriously and that they had some discussions with about it) “understand the implications”.

But upon reflection this is a pretty weird thing to say. Do I “understand the implications”? Would I feel more sorrow at 1 certain death, or at this abstract likely future extinction of billions of people and insane numbers of potential future humans? Emotions are just not at all good at gauging scope. It’s not clear that it’s adaptive / useful to freak out about AI xrisk. I think it’s moderately useful to the extent to which it drives you out of complacency and to try uncomfortable things. But can easily become unhealthy. I’ll discuss how to deal with this here.

A lot of people pay lip service to buying the argument, but they don’t understand the implications so I don’t think this counts.

Taking helpful action is pretty rare — although I’ve seen modest success amongst people that understand the implications. Maybe I again have a pretty high bar here. Have I taken helpful actions?

I’ve definitely spent some time understanding the issue.
I’ve done a modest amount of communication about the issue.
I’ve started trying to solve the problem from a technical angle (see, e.g., MAD Agenda) — although I’ve made very little progress.

So idk if this should meet the bar. Maybe it doesn’t make sense to have some weird “bar” thing.

Maybe let’s just all try to do some stuff, and occasionally reflect on if there are much better ways that we could be spending our time.

Finding peace anyways

Okay so the conclusion thus-far is something like “I empirically observe that (most) people aren’t reacting to this issue the way I’d like when I talk about it”. Furthermore, having very different predictions about the future from other people seems to make some social interactions weird.

It’s basically tautological that the closest people to me will be the people that can understand important parts of my world view. The people whose intuitions I trust about AI stuff will generally be people that seem to make good predictions and have predictions similar to mine. The people that I respect the most will be the people trying to do something to mitigate risks. This makes a lot of sense and is fine. There are lots of people (in Boston + Bay Area at least) that understand the risks from AI, believe this is a super big deal and even are trying to do something about it! I’d love to meet and talk to more of these people — these really do seems like generally the most awesome people ever.

But anyways, now that I’m getting to the end of the post I’ve realized something: I’m actually pretty lucky and have super awesome friends.

By which I basically mean I have friends that are good at listening to my concerns, and helping out with various things such as communication efforts, alignment philosophy + math questions, and just being nice and stuff (e.g., cooking dinner w/ me, walking random places, doing zumba, etc).

So even if sometimes feels like the whole world is against AI safety. It’s nice to remember that there are people that care about me, and that I care about.

I guess that’s why it’s important to try to save humanity in the first place --- humans are awesome.

🐱 Skyspace3.0

Explorer

Thinking about people's reactions to AI xrisk

Graph View

Backlinks