1.24.3-6 alignment philosophy with N
week goals:
-
track time use at hour level granularity
- goal (G) cannot be set afterwords
- actual (A) should only be set immediately after an hour — but to start can have a little slack
- score:
points for every unset G, A point for if A was egregious
- objective: obtain score less than 10
- things I’d like to avoid (this week in particular). (note that I don’t think these things are generally bad, but they are not how I want to allocate my time this week for various reasons). (these should generally be classified as egregious).
- looking at the news
- reading other people’s thoughts
- writing blog posts about things other than technical alignment and math
-
make progress on a fun math problem
- candidates:
- dsipp, or n’s v
- my backdoors q, or victors
- PDSG?
- measure — a blog post
- candidates:
-
propose an interesting math question based on alignment philosophizing
- measure — a blog post
-
1.26.7
- G —
- A — breakfast
-
1.26.8
- G — write Solving AI Safety from 1st principles
- A — write Solving AI Safety from 1st principles
-
1.26.9
- G —
- A — write Solving AI Safety from 1st principles
-
1.26.10
- G —
- A — write Solving AI Safety from 1st principles
-
1.26.11
- G —
- A — write Solving AI Safety from 1st principles + read some of Paul’s blog posts to understand high stakes vs low stakes alignment
post mortem on the day:
- had lunch with some friends and family — this was good
- also called some friends i hadnt seen in a while at night. this was pretty nice.
- tracking time didn’t happen at all. this is really too bad.
- after dinner, i spent a very large amount of time dealing with a computer issue.
- i was trying to get my site ai-xrisk.surge.sh hosted.
- finally did get it to work — sigh
- I guess some of this time was spent writing the post as well.
- i was trying to get my site ai-xrisk.surge.sh hosted.
- im kind of annoyed that i spent so long on this computer issue.
- I suspect I would have not done this if I had been logging time.
- I also did end up reading some blog posts. they were related to what i was working on (my philosophy of alignment post) but I still think that i overall got distracted.
- overall, im excited to see substantially better performance tmw
biggest goal — extract a good question out of Solving AI Safety from 1st principles (or MAD Agenda) second goal — do some good math
- 1.27.7
- G — breakfast + goto
- A — breakfast + wentto kendall
- 1.27.8
- G — read Paul’s posts on low vs high stakes alignment ⇒ summarize in Solving AI Safety from 1st principles
- A — started by posting xrisk facebook post. then worked on low vs high stakes alignment
- 1.27.9
- G — expand the tree of Solving AI Safety from 1st principles
- A — finished up low vs high stakes alignment
- 1.27.10
- G — expand tree of Solving AI Safety from 1st principles
- A — thought about it a bit
- 1.27.11
- G — expand tree of Solving AI Safety from 1st principles — actually decided i need to read some more of paul’s thinking about the problem first
- A — read Pauls stuff
- 1.27.12
- G — lunch + discuss MAD Agenda or something with a friend
- A — discussed high stakes alignment at whiteboard
- 1.27.1
- G — math problem
- A — discussed high stakes alignment w a friend
- 1.27.2
- G — math problem
- A — reading iterated amplification
- 1.27.3
- G — make a good question from Solving AI Safety from 1st principles
- A — probably was just reading IDA stuff
- 1.27.4
- G — make some progress on above question
- A — tried to make up a question about high stakes alignment wasntquite crisp.
- 1.27.5
- G — dinner + discuss health or productivity or altruism or trivia
- A — dinna + explaining why AI is good and will get better soon
- 1.27.6
- G — zumba?
- A — more dinner, short zumba
- 1.27.7
- G — write up MAD Agenda better OR do math
- A — alignment with N — talked thru some stuff — had a nice problem — a way of formalizing distilation
- 1.27.8
- G — try to make a MAD Agenda problem OR do math
- A — thoguht about distilation problem
- 1.27.9
- G — CAIP questionare thingy
- A — distillation thing doesnt work
post mortem on the day:
it was pretty good overall! still quite a bit of reading. will plan to not read so much tmw.
i did an okay job of logging and it helped me keep track of time better.
Tuesday goals:
- do some math (eg 4 hours) (i suspect that even tho im impatient wrt math, if i dont do it i wont have good maths intuitions. and it really has been a while since ive done a ton of serious math)
- p2p?
- arc qs?
- backdoor q?
- dsipp?
- pdsg?
- make a good alignment question
- formalize my simple high stakes alignment question
- is there some version of N’s distillation q that can be interesting?
- is there some sense in which we can prove amplification is possible?
- only at the later end of the day start reading.
- read ELK. try to come up with reasons why it’s impossible.
- or, fix up MAD Agenda and try to come up with legit math qs modelling this
- comms
- manage fb
- caip thing?
plan for today: context in :thread:
major goal: make an interesting mathematical model of some alignment question
- ”high stakes alignment” ⇒ math question (possibly involves trying to define RAT or MAD and working on my writeups about these)
- Is there some version of N’s distillation q that can be interesting?
- Make “amplification” into a math question.
minor goal: think about comms more
-
follow up on my facebook post about ai risk (https://ai-xrisk.surge.sh/) — does it seem like anyone understood what I was saying?
-
does it feel like i contributed to race dynamics or made it more likely that useful coordination / governance happens?
-
brainstorming what are actually effective ways to communicate (current thoughts: https://pauseai.info/ suggests cold emailing “influential ppl” eg professors or political ppl and expressing concern about the risks and offering to chat) + then do one of these ways
-
1.28.9
- G — N distillation
- A — typed up summary of what we discussed
-
1.28.10
- G — MK — alg fairness
- A — MK — alg fairness
-
1.28.11
- G — N distillation --- also can we refute NCP ??
- A —
-
1.28.12
- G — model high stakes alignment?
- A —
-
1.28.1
- G — model high stakes alignment?
- A —
-
1.28.2
- G — model amp
- A —
-
1.28.3
- G — model amp
- A —
-
1.28.4
- G — mathq
- A —
-
1.28.5
- G — dinna
- A —
-
1.28.6
- G — mathq
- A —
-
1.28.7
- G — comms p1
- A —
-
1.28.8
- G — comms p2
- A —
-
1.28.9
- G — stanley
- A —