1.24.3-6 alignment philosophy with N

week goals:

  • track time use at hour level granularity

    • goal (G) cannot be set afterwords
    • actual (A) should only be set immediately after an hour — but to start can have a little slack
    • score:
      • points for every unset G, A
      • point for if A was egregious
    • objective: obtain score less than 10
    • things I’d like to avoid (this week in particular). (note that I don’t think these things are generally bad, but they are not how I want to allocate my time this week for various reasons). (these should generally be classified as egregious).
      • looking at the news
      • reading other people’s thoughts
      • writing blog posts about things other than technical alignment and math
  • make progress on a fun math problem

    • candidates:
      • dsipp, or n’s v
      • my backdoors q, or victors
      • PDSG?
    • measure — a blog post
  • propose an interesting math question based on alignment philosophizing

    • measure — a blog post
  • 1.26.7

    • G —
    • A — breakfast
  • 1.26.8

  • 1.26.9

  • 1.26.10

  • 1.26.11

post mortem on the day:

  • had lunch with some friends and family — this was good
  • also called some friends i hadnt seen in a while at night. this was pretty nice.
  • tracking time didn’t happen at all. this is really too bad.
  • after dinner, i spent a very large amount of time dealing with a computer issue.
    • i was trying to get my site ai-xrisk.surge.sh hosted.
      • finally did get it to work — sigh
    • I guess some of this time was spent writing the post as well.
  • im kind of annoyed that i spent so long on this computer issue.
  • I suspect I would have not done this if I had been logging time.
  • I also did end up reading some blog posts. they were related to what i was working on (my philosophy of alignment post) but I still think that i overall got distracted.
  • overall, im excited to see substantially better performance tmw

biggest goal — extract a good question out of Solving AI Safety from 1st principles (or MAD Agenda) second goal — do some good math

  • 1.27.7
    • G — breakfast + goto
    • A — breakfast + wentto kendall
  • 1.27.8
  • 1.27.9
  • 1.27.10
  • 1.27.11
  • 1.27.12
    • G — lunch + discuss MAD Agenda or something with a friend
    • A — discussed high stakes alignment at whiteboard
  • 1.27.1
    • G — math problem
    • A — discussed high stakes alignment w a friend
  • 1.27.2
  • 1.27.3
  • 1.27.4
    • G — make some progress on above question
    • A — tried to make up a question about high stakes alignment wasntquite crisp.
  • 1.27.5
    • G — dinner + discuss health or productivity or altruism or trivia
    • A — dinna + explaining why AI is good and will get better soon
  • 1.27.6
    • G — zumba?
    • A — more dinner, short zumba
  • 1.27.7
    • G — write up MAD Agenda better OR do math
    • A — alignment with N — talked thru some stuff — had a nice problem — a way of formalizing distilation
  • 1.27.8
    • G — try to make a MAD Agenda problem OR do math
    • A — thoguht about distilation problem
  • 1.27.9
    • G — CAIP questionare thingy
    • A — distillation thing doesnt work

post mortem on the day:

it was pretty good overall! still quite a bit of reading. will plan to not read so much tmw.

i did an okay job of logging and it helped me keep track of time better.

Tuesday goals:

  • do some math (eg 4 hours) (i suspect that even tho im impatient wrt math, if i dont do it i wont have good maths intuitions. and it really has been a while since ive done a ton of serious math)
    • p2p?
    • arc qs?
    • backdoor q?
    • dsipp?
    • pdsg?
  • make a good alignment question
    • formalize my simple high stakes alignment question
    • is there some version of N’s distillation q that can be interesting?
    • is there some sense in which we can prove amplification is possible?
  • only at the later end of the day start reading.
    • read ELK. try to come up with reasons why it’s impossible.
    • or, fix up MAD Agenda and try to come up with legit math qs modelling this
  • comms
    • manage fb
    • caip thing?

plan for today: context in :thread:

major goal: make an interesting mathematical model of some alignment question

  • ”high stakes alignment” math question (possibly involves trying to define RAT or MAD and working on my writeups about these)
  • Is there some version of N’s distillation q that can be interesting?
  • Make “amplification” into a math question.

minor goal: think about comms more

  • follow up on my facebook post about ai risk (https://ai-xrisk.surge.sh/) — does it seem like anyone understood what I was saying?

  • does it feel like i contributed to race dynamics or made it more likely that useful coordination / governance happens?

  • brainstorming what are actually effective ways to communicate (current thoughts: https://pauseai.info/ suggests cold emailing “influential ppl” eg professors or political ppl and expressing concern about the risks and offering to chat) + then do one of these ways

  • 1.28.9

    • G — N distillation
    • A — typed up summary of what we discussed
  • 1.28.10

    • G — MK — alg fairness
    • A — MK — alg fairness
  • 1.28.11

    • G — N distillation --- also can we refute NCP ??
    • A —
  • 1.28.12

    • G — model high stakes alignment?
    • A —
  • 1.28.1

    • G — model high stakes alignment?
    • A —
  • 1.28.2

    • G — model amp
    • A —
  • 1.28.3

    • G — model amp
    • A —
  • 1.28.4

    • G — mathq
    • A —
  • 1.28.5

    • G — dinna
    • A —
  • 1.28.6

    • G — mathq
    • A —
  • 1.28.7

    • G — comms p1
    • A —
  • 1.28.8

    • G — comms p2
    • A —
  • 1.28.9

    • G — stanley
    • A —