CAIP model organism proposal

CAIP has organized a time for college students to talk to people in the white house about AI safety Here are the specs.

CAIP is inviting college AI safety teams who are interested in presenting during Demo Day to submit a proposal below. Teams are encouraged to design an app, tool, video or website to serve as the focal point of their demonstration.

A unique and creative demonstration of leading-edge AI capabilities.

Ability to demonstrate/articulate how such capabilities could lead to potentially harmful or destructive outcomes for people, vital infrastructure and/or society Demonstrations can include but should not be limited to: voice emulation, video creation, prediction/forecasting, writing, robotics, tool use, cyber capabilities, biological research etc. We encourage teams to be as creative and expansive as possible.

I decided to talk about alignment faking. You can see my proposal by searching for me on youtube.

I wish the proposal could have been better — I’m quite worried that my demo is too complicated / hard for a non-technical audience to understand.

But, it’s not crucial that this succeeds. And me spending a couple days working on this is an improvement on the default which is that no one does anything about this.

Doing this little project gave me an appreciation for how hard it is to actually do things. Neel Nanda has a nice blog post about this. Anyways, your daily inspirational message is

Just do it. plz.

Initial thoughts — somehow talk about alignment faking paper.

Okay China’s deepseek gives access to a hidden scratchpad.
Also it’s quite easy to convince models to reason in a synthetic hidden scratchpad. You just ask nicely! (say, please give your reasoning in <reasoning></reasoning> tags before giving your output in output tags.)
looks like redwood+anthropic made a replication guide.

🐱 Skyspace3.0

Explorer

CAIP model organism proposal

Graph View

Backlinks