A simple alignment proposal

Here’s a simple alignment proposal:

Bargain with the first moderately superintelligent AI that we’ll give it 50% of the universe if it protects us.

I think it’s worth someone investing more research into this — but I won’t.

Here are some reasons why this might be reasonable:

It feels like we will at some point in time have some power over some pretty buff AIs.
- More precisely, it’s quite possible that we could decide which out of a couple of AIs will be widely deployed or something.
It feels kind of plausible that an AI could make a binding commitment somehow, by somehow provably helping us modify it’s objective function or something.

The morality of such threats / bargaining aside, is this a good idea?

There are a couple of problems that I see with this:

Humans aren’t trustworthy — so the AI shouldn’t take the trade.
1. This feels like a fixable problem though.
Not clear how to get an AI to commit to this, unless you can somehow verify that it’s edited its preferences to terminally value keeping this promise, in which case we should just be able to solve alignment.
Most humans don’t think that we’re at risk, so wouldn’t be down for such a trade.

There’s also a more funky version of this: acausal trade. Go read the wiki page for Newcomb’s two box experiment or something.

You could hash this out as,

“You should be nice to us, because if we successfully align AIs we’ll put AIs in a simulation and if those AIs are nice we’ll reward them with 1% of the universe but if they’re mean then we won’t”.

This feels fairly unlikely to work, because I don’t think the AI should believe that we’d actually uphold our end of the deal.

🐱 Skyspace3.0

Explorer

A simple alignment proposal

Graph View

Backlinks