*log

1.24.3-6 alignment philosophy with N

week goals:

track time use at hour level granularity
- goal (G) cannot be set afterwords
- actual (A) should only be set immediately after an hour — but to start can have a little slack
- score:
  - points for every unset G, A
  - point for if A was egregious
- objective: obtain score less than 10
- things I’d like to avoid (this week in particular). (note that I don’t think these things are generally bad, but they are not how I want to allocate my time this week for various reasons). (these should generally be classified as egregious).
  - looking at the news
  - reading other people’s thoughts
  - writing blog posts about things other than technical alignment and math
make progress on a fun math problem
- candidates:
  - dsipp, or n’s v
  - my backdoors q, or victors
  - PDSG?
- measure — a blog post
propose an interesting math question based on alignment philosophizing
- measure — a blog post
1.26.7
- G —
- A — breakfast
1.26.8
- G — write Solving AI Safety from 1st principles
- A — write Solving AI Safety from 1st principles
1.26.9
- G —
- A — write Solving AI Safety from 1st principles
1.26.10
- G —
- A — write Solving AI Safety from 1st principles
1.26.11
- G —
- A — write Solving AI Safety from 1st principles + read some of Paul’s blog posts to understand high stakes vs low stakes alignment

post mortem on the day:

had lunch with some friends and family — this was good
also called some friends i hadnt seen in a while at night. this was pretty nice.
tracking time didn’t happen at all. this is really too bad.
after dinner, i spent a very large amount of time dealing with a computer issue.
- i was trying to get my site ai-xrisk.surge.sh hosted.
  - finally did get it to work — sigh
- I guess some of this time was spent writing the post as well.
im kind of annoyed that i spent so long on this computer issue.
I suspect I would have not done this if I had been logging time.
I also did end up reading some blog posts. they were related to what i was working on (my philosophy of alignment post) but I still think that i overall got distracted.
overall, im excited to see substantially better performance tmw

biggest goal — extract a good question out of Solving AI Safety from 1st principles (or MAD Agenda) second goal — do some good math

1.27.7
- G — breakfast + goto
- A — breakfast + wentto kendall
1.27.8
- G — read Paul’s posts on low vs high stakes alignment ⇒ summarize in Solving AI Safety from 1st principles
- A — started by posting xrisk facebook post. then worked on low vs high stakes alignment
1.27.9
- G — expand the tree of Solving AI Safety from 1st principles
- A — finished up low vs high stakes alignment
1.27.10
- G — expand tree of Solving AI Safety from 1st principles
- A — thought about it a bit
1.27.11
- G — expand tree of Solving AI Safety from 1st principles — actually decided i need to read some more of paul’s thinking about the problem first
- A — read Pauls stuff
1.27.12
- G — lunch + discuss MAD Agenda or something with a friend
- A — discussed high stakes alignment at whiteboard
1.27.1
- G — math problem
- A — discussed high stakes alignment w a friend
1.27.2
- G — math problem
- A — reading iterated amplification
1.27.3
- G — make a good question from Solving AI Safety from 1st principles
- A — probably was just reading IDA stuff
1.27.4
- G — make some progress on above question
- A — tried to make up a question about high stakes alignment wasntquite crisp.
1.27.5
- G — dinner + discuss health or productivity or altruism or trivia
- A — dinna + explaining why AI is good and will get better soon
1.27.6
- G — zumba?
- A — more dinner, short zumba
1.27.7
- G — write up MAD Agenda better OR do math
- A — alignment with N — talked thru some stuff — had a nice problem — a way of formalizing distilation
1.27.8
- G — try to make a MAD Agenda problem OR do math
- A — thoguht about distilation problem
1.27.9
- G — CAIP questionare thingy
- A — distillation thing doesnt work

post mortem on the day:

it was pretty good overall! still quite a bit of reading. will plan to not read so much tmw.

i did an okay job of logging and it helped me keep track of time better.

Tuesday goals:

do some math (eg 4 hours) (i suspect that even tho im impatient wrt math, if i dont do it i wont have good maths intuitions. and it really has been a while since ive done a ton of serious math)
- p2p?
- arc qs?
- backdoor q?
- dsipp?
- pdsg?
make a good alignment question
- formalize my simple high stakes alignment question
- is there some version of N’s distillation q that can be interesting?
- is there some sense in which we can prove amplification is possible?
only at the later end of the day start reading.
- read ELK. try to come up with reasons why it’s impossible.
- or, fix up MAD Agenda and try to come up with legit math qs modelling this
comms
- manage fb
- caip thing?

plan for today: context in :thread:

major goal: make an interesting mathematical model of some alignment question

”high stakes alignment” ⇒ math question (possibly involves trying to define RAT or MAD and working on my writeups about these)
Is there some version of N’s distillation q that can be interesting?
Make “amplification” into a math question.

minor goal: think about comms more

follow up on my facebook post about ai risk (https://ai-xrisk.surge.sh/) — does it seem like anyone understood what I was saying?
does it feel like i contributed to race dynamics or made it more likely that useful coordination / governance happens?
brainstorming what are actually effective ways to communicate (current thoughts: https://pauseai.info/ suggests cold emailing “influential ppl” eg professors or political ppl and expressing concern about the risks and offering to chat) + then do one of these ways
1.28.9
- G — N distillation
- A — typed up summary of what we discussed
1.28.10
- G — MK — alg fairness
- A — MK — alg fairness
1.28.11
- G — N distillation --- also can we refute NCP ??
- A —
1.28.12
- G — model high stakes alignment?
- A —
1.28.1
- G — model high stakes alignment?
- A —
1.28.2
- G — model amp
- A —
1.28.3
- G — model amp
- A —
1.28.4
- G — mathq
- A —
1.28.5
- G — dinna
- A —
1.28.6
- G — mathq
- A —
1.28.7
- G — comms p1
- A —
1.28.8
- G — comms p2
- A —
1.28.9
- G — stanley
- A —

🐱 Skyspace3.0

Explorer

*log

Graph View

Backlinks