Ryan has recently written a bunch of really long really interesting articles about how he is thinking about control. I will read these and keep notes here.

Things I’d like to focus on:

  • Record ideas that surprised me.
  • What do I disagree with?
  • What is he missing?

We will use AI’s for biodefense

  • I’m not sure this is true.
    • I feel like ppl might prioritize medical applications (like “cure cancer”)

Here’s what Biden has to say about this idk if Trump has similar thoughts. i guess leaving WHO doesn’t bode well?

Prevent (or catch) rogue internal deployments

  • There are two parts to this:
  1. Cultivate willpower within companies to try to prevent rogue internal deployments
    1. Like I’m less scared about AI’s disabling our monitoring schemes than AI companies not implementing very comprehensive monitoring schemes
  2. Think about ways that this could happen “silently”, and monitor for these. Make sure that all attack surfaces are covered.
  3. You also just want to have a pretty robust monitoring system that allows you to prevent collusion

Make sure that models aren’t fine tuned to evade monitors / do stegonagrophy

  • wow this seems really important!

Ryan claims that in undignified worlds,

  • getting useful work out of AI’s is more useful than preventing rogue internal deployments
  • my understanding of the reason is
    • in an undignified world, even if you catch a rogue internal deployment it does not result in a substantive pause.