Ryan has recently written a bunch of really long really interesting articles about how he is thinking about control. I will read these and keep notes here.
Things I’d like to focus on:
- Record ideas that surprised me.
- What do I disagree with?
- What is he missing?
We will use AI’s for biodefense
- I’m not sure this is true.
- I feel like ppl might prioritize medical applications (like “cure cancer”)
Here’s what Biden has to say about this idk if Trump has similar thoughts. i guess leaving WHO doesn’t bode well?
Prevent (or catch) rogue internal deployments
- There are two parts to this:
- Cultivate willpower within companies to try to prevent rogue internal deployments
- Like I’m less scared about AI’s disabling our monitoring schemes than AI companies not implementing very comprehensive monitoring schemes
- Think about ways that this could happen “silently”, and monitor for these. Make sure that all attack surfaces are covered.
- You also just want to have a pretty robust monitoring system that allows you to prevent collusion
Make sure that models aren’t fine tuned to evade monitors / do stegonagrophy
- wow this seems really important!
Ryan claims that in undignified worlds,
- getting useful work out of AI’s is more useful than preventing rogue internal deployments
- my understanding of the reason is
- in an undignified world, even if you catch a rogue internal deployment it does not result in a substantive pause.