Ryan has recently written a bunch of really long really interesting articles about how he is thinking about control. I will read these and keep notes here.
Things Iād like to focus on:
- Record ideas that surprised me.
- What do I disagree with?
- What is he missing?
We will use AIās for biodefense
- Iām not sure this is true.
- I feel like ppl might prioritize medical applications (like ācure cancerā)
Hereās what Biden has to say about this idk if Trump has similar thoughts. i guess leaving WHO doesnāt bode well?
Prevent (or catch)Ā rogue internal deployments
- There are two parts to this:
- Cultivate willpower within companies to try to prevent rogue internal deployments
- Like Iām less scared about AIās disabling our monitoring schemes than AI companies not implementing very comprehensive monitoring schemes
- Think about ways that this could happen āsilentlyā, and monitor for these. Make sure that all attack surfaces are covered.
- You also just want to have a pretty robust monitoring system that allows you to prevent collusion
Make sure that models arenāt fine tuned to evade monitors / do stegonagrophy
- wow this seems really important!
Ryan claims that in undignified worlds,
- getting useful work out of AIās is more useful than preventing rogue internal deployments
- my understanding of the reason is
- in an undignified world, even if you catch a rogue internal deployment it does not result in a substantive pause.