EDIT: I’ve decided that the main argument of this post is basically False.

Consequentially I’m more excited about:

  1. Less ambitious “little hacks” that don’t solve the whole problem but could plausibly be helpful.
    1. I’d put most control things in this category.
    2. There could also be some theory stuff in here.

It’s more important to start running than to run in precisely the right direction. But all else equal it’s preferable to run in the correct direction.



I’m planning to focus on scalable alignment

I think of this as basically “solving alignment in the limit”

Does this actually make sense to do?

Why not solve alignment via practical means (i.e., AI control) and then have them do the hard stuff?

Why not try to buy more time via doing politics / communication?


I maintain that probably ~50% of AI safety resources (at least) should be dedicated to AI control.

But here’s some potential motivation for thinking about scalable alignment now:

  • maybe there’s a long critical path
  • maybe having humans in the loop that have some ideas / intuitions for good directions to direct the AIs on is good
  • maybe solving this tricky theoretical problem has bottlenecks, and so doesn’t get as much benefit from parallelization
  • generally, maybe this becomes a problem before AIs can solve it.

anyways, I’m not advocating for too much AI safety researchers to be allocated to scalable alignment

although if all the mathematicians in the world decided to give a stab (or even a concerted effort) at solving scalable alignment I wouldn’t complain :P (assuming they didn’t accidentally boost capabilities… which is a pretty strong assumption. but i digress)

but i think its a good option for me and there aren’t any easy better options. so ill stick to it for now and re-evaluate at some unspecified future point in time :P

but for real I’d like some ppl allocated to thinking about a longer term stable solution.

and its not clear whether other things like “politics” are more or less cursed than theory anyways.

anyways, lets go find an easy to implement competitive cheap alignment algorithm!!!


some more takes:

There are a lot of things which amount to buying time. For instance, trying to garner political will to slow stuff down, or implementing control measures.

But, what are we buying time for?

I think we should be buying time for getting a scalable alignment solution.

I think I have some skills for thinking about this, and it’s not very saturated.

being helpful on technical stuff is only going to happen in the far right tails.

so I’m officially going to not worry about whether or not my plan is reasonable or not for the next 5 months.