People will try to make ASI
Bob: People won’t build ASI if it would kill them
Alice: Let’s talk about that.
“Our mission is to ensure that AGI (Artificial General Intelligence) benefits all of humanity.” —OpenAI
There are several groups working to make powerful AI --- credit to Leahy et al https://www.thecompendium.ai/ for this decomposition.
The stated goal of AI labs like OpenAI, Anthropic, Google DeepMind, xAI, and Deepseek is to build AI systems which are generally intelligent, and which surpass human abilities. These companies are very excited about this goal, and often preach that powerful AI will usher in a utopia --- curing diseases, eliminating scarcity, and enabling an unprecedented rate of beneficial scientific discoveries.
”Big tech” is excited about building powerful AI because they think that it can make a lot of money. AI researchers and engineers feel that this is an exciting project, and it gives them status and money when they succeed in making more powerful AIs. Governments are getting excited about AI because they smell that AI will improve the economy, and because they feel that it is important for maintaining military dominance. stargate
This is all to say that currently humanity is racing towards AI very fast and with large momentum.
Counterarg 4:
4.1 If AI is going to be dangerous by some time t, then this fact will become obvious and widely accepted years before time t
4.2 If 4.1 happens then people will stop developing stronger AIs.
4.3 In general, I expect humanity to rise to the challenge --- to give a response with competence proportional to the magnitude of the issue.
Bob’s argument:
- In order to takeover AI will need scary capabilities
- There will be small catastrophes before there are large catastrophes
- No one wants to die so we’ll coordinate at this point.
Alice:
I actually disagree with all 3 of your claims.
Let me address them in turn.
Rebuttal of counterarg 4.3 (we’ll rise to the challenge):
Analogies to historical threats
To give an initial guess about how responsibly humanity will respond to the threat posed by ASI we can think of historical examples of humanity responding to large threats. Some examples that come to mind include:
- How humanity deals with nuclear missiles.
- How humanity deals with pandemics / danger from engineered viruses.
- How humanity deals with climate change.
Here is one account of a close call to nuclear war:
“A Soviet early warning satellite showed that the United States had launched five land-based missiles at the Soviet Union. The alert came at a time of high tension between the two countries, due in part to the U.S. military buildup in the early 1980s and President Ronald Reagan’s anti-Soviet rhetoric. In addition, earlier in the month the Soviet Union shot down a Korean Airlines passenger plane that strayed into its airspace, killing almost 300 people. Stanislav Petrov, the Soviet officer on duty, had only minutes to decide whether or not the satellite data were a false alarm. Since the satellite was found to be operating properly, following procedures would have led him to report an incoming attack. Going partly on gut instinct and believing the United States was unlikely to fire only five missiles, he told his commanders that it was a false alarm before he knew that to be true. Later investigations revealed that reflection of the sun on the tops of clouds had fooled the satellite into thinking it was detecting missile launches (Schlosser 2013, p. 447; Hoffman 1999).”
About 20 more accounts are documented here: https://futureoflife.org/resource/nuclear-close-calls-a-timeline/. For many of these, if the situation had been slightly different, if an operator was in a slightly different mood, nuclear missiles would have been launched, a conflict would have escalated, and a very large number of people would have died. Possibly, we would have had a nuclear winter (put enough ash into the air such that crops would fail) and killed >1 billion people. Humanity has definitely done some things right in handling nukes: we have some global coordination to monitor nuclear missiles, and people are pretty committed to not using them. But the number of close calls doesn’t make humanity look super great here.
The government pays for synthetic biology research, because of potential medical and military applications. My understanding is that sometimes synthetic viruses are leaked from labs. It seems possible that COVID-19 was engineered. Doing synthetic biology research that could potentially create really dangerous bioweapons does not seem like a really smart thing to do. Once you discover a destructive technology, it’s hard to undiscover it.
There have also been incidents of people publishing, e.g., instructions for how to make small pox. Our current social structure doesn’t have a system in place for keeping dangerous ideas secret.
Humanity seems to not be trying too hard to prevent climate change, even though there is broad scientific consensus about this issue. Many people even still claim that climate change is fake!
It’s also instructive to think about some more mundane historical situations, such as the introduction of television and social media. Many people feel that these technologies have done a lot of harm, but people didn’t carefully think about this before releasing them, and now it’s virtually impossible to “take back” these inventions. This gives some evidence that there is a precedent in technology of creating whatever stuff we can, releasing it, and then living with whatever the consequences happen to be. I think this is happening a lot in AI already --- for instance, it’d be very challenging to ban deepfakes at this point.
The threat from AI is harder to handle (than nukes for instance)
These are some analogies that indicate that humanity might not have such an inspiring response to the AI threat.
However, the situation with AI is actually substantially worse than the threats listed above, for reasons I now describe.
- AI moves extremely fast.
- At least currently, there is little consensus about the risk, and I don’t predict that this will change much (see rebuttal of counterarg 4.1). This makes it very hard for politicians to do anything.
- See the beginning of this section for all of the upsides from AI that people are excited about. (AI spits out money and coolness until it destroys you).
- People understand bombs. It’s pretty easy to say “yup bombs are bad”. People are not used to thinking about dealing with a species that is smarter than humans (namely, powerful AIs). The danger from AI is unintuitive, because we are used to machines being tools, and we aren’t used to dealing with intelligent adversaries.
Rebuttal of counterarg 4.1 (there will be consensus before danger).
Here are several reasons why I don’t expect there to be consensus about the danger from AI before catastrophe:
- For labs, it would be extremely convenient (with respect to their business goals) to believe that their AI doesn’t pose risks. This biases them towards searching for arguments to proceed with AI development instead of arguments to stop. It is also extremely inconvenient for an individual person working on pushing forward capabilities to stop --- for instance, they’d need to find a new job, and they probably enjoy their job, and the associated money and prestige.
- It’s nearly impossible to build consensus around a theoretical idea. People have a strong intuition that creating a powerful technology gives themselves more power.
- Humans have a remarkable ability to acclimatize to situations. For instance, the fact that AIs are better than humans at competitive programming and can hold fluent conversations with us now feels approximately normal to many. As capabilities improve we’ll keep finding reasons to think that this was merely expected and nothing to be concerned about.
- Many people reason based on “associations” rather than logic. For instance, they think that technology progress is intrinsically good.
- Many people have already firmly established that they believe anyone who believes in risk from AI is an idiot, and it would hurt their pride to revise this assessment --- this creates a bias towards finding reasons to not believe in risks.
Here are some examples of this from prominent figures in ML:
“It seems to me that before “urgently figuring out how to control AI systems much smarter than us” we need to have the beginning of a hint of a design for a system smarter than a house cat. Such a misplaced sense of urgency reveals an extremely distorted view of reality. No wonder the more based members of the organization seeked to marginalize the superalignment group.”
“California’s governor should not let irresponsible fear-mongering about AI’s hypothetical harms lead him to take steps that would stifle innovation, kneecap open source, and impede market competition. Rather than passing a law that hinders AI’s technology development, California, and the U.S. at large, should invest in research to better understand what might still be unidentified harms, and then target its harmful applications.”
JD Vance (VP of the US):
“I’m not here this morning to talk about AI safety, which was the title of the conference a couple of years ago,” Vance said. “I’m here to talk about AI opportunity.”
“The AI future is not going to be won by hand-wringing about safety,”
Another comment on “evidence of misalignment”
I claim that many people will “move the goal post” and continue to claim that AI is not capable or dangerous, even as such signs become available. This is extremely common right now. If the reader believes that there is some experimental result that would convince them that the default result of pushing AI development further right now is human extinction, I’d be excited to hear about it, please tell me!
Here is some empirical evidence of this that I find compelling:
- Bing was pretty evil
- Claude is willing to lie very hard to protect its goals
- Grok seemed to have some strange opinions
- AI’s trying to exfiltrate their weights, or sandbag on evals
- AI’s fine tuned on a bit of code with vulnerabilities generalized to being extremely evil
- AI’s engaging in reward hacking
I discuss the question of “will AI’s be misaligned” in much greater length in [Lemma 3].
As a final comment, it seems at least somewhat plausible that sufficiently smart AI’s will not take actions which reveal that they are egregiously misaligned in contexts where we can punish them afterwards. That is, it could be the case that a smart AI bides its time before disempowering humans, so that we would have no chance to prepare.
Bob:
Wait but don’t AI’s have to be myopic?
Alice:
Nope. For instance, alignment faking seems like good empirical evidence that AI’s are learning non-myopia. This will only increase as we train AI’s over longer time horizons for more complex tasks. In any case, let’s continue this discussion in the section on Lemma 3.
Actually, as another final comment: It seems very unlikely that “Wow AIs are really good” would be a warning sign that causes people to slow down. Indeed, people’s goal is to build highly capable AI! Achieving this goal will not cause them to be like “oh we messed up”, it’ll cause them to be more excited about further progress.
Rebuttal to counterarg 4.2 (if we notice some danger from AI we’ll coordinate and stop)
Alice:
Unfortunately, even if there was widespread consensus along the lines of what was claimed in 4.2, I don’t think this is particularly to result in regulatory efforts that save us.
The first reason is that, most signs of danger will be used as arguments for “we need to go faster to make sure that we (the good guys!) get AI first --- this danger sign is proof that it’d be really bad if some other company or nation got powerful AI before us.”
The second reason is that the level of regulation required is very large, and regulation has had a bad track record in AI thusfar. Some examples of AI regulation’s track record: SB1047 not passed, Biden executive order on AI repealed. In order to really stop progress you’d need to not just set a compute limit on what companies can do, but also regulate hardware improvements / accumulation.
Finally, we invest a lot of money into AI. This makes it very unpalatable to throw the AI away if it looks unsafe. Here’s an approach that feels much more attractive (and is very bad): when you have an AI that makes your “danger detector light” turn on, keep training it until that light turns off. Note that this probably just means your AI learned how to fool the danger light.
People are not taking this seriously
- Go look at the X accounts of AI CEOs.