Manifund foxManifund
Home
Login
About
People
Categories
HomeAboutPeopleCategoriesLoginCreate
GarretteBaker avatarGarretteBaker avatar
Garrett Baker

@GarretteBaker

I'm an independent alignment researcher, self-taught in machine learning, convex optimization, and probability theory

https://github.com/GarretteBaker/
$0total balance
-$1charity balance
$0cash balance

$1 in pending offers

About Me

For approximately the past year, I’ve been doing alignment research full-time, working on a variety of approaches, and trying to understand the problem in-depth enough to invent new ones. If funded, I plan to continue doing approximately the same work as before, which has historically been scalable mechanistic interpretability, formal and prosaic corrigibility, reflective stability, and a bunch of value theory stuff. Along with lots of upskilling in convex optimization, machine learning, neuroscience, and economics.

My current project is an attempt to connect the tools & theory of singular learning theory with our knowledge of the inductive biases and loss landscapes of large language models.

Projects

Garrett Baker salary to study the development of values of RL agents over time

Outgoing donations

AI Safety Reading Group at metauni [Retrospective]
$10
8 months ago
Act I: Exploring emergent behavior from multi-AI, multi-human interaction
$96
8 months ago
Act I: Exploring emergent behavior from multi-AI, multi-human interaction
$50
8 months ago
Lightcone Infrastructure
$95
8 months ago
Next Steps in Developmental Interpretability
$200
9 months ago
Lightcone Infrastructure
$50
9 months ago

Comments

Act I: Exploring emergent behavior from multi-AI, multi-human interaction
GarretteBaker avatar

Garrett Baker

9 months ago

I have seen some of amp's work, and it is pretty interesting, and novel in the grand scheme of things

Lightcone Infrastructure
GarretteBaker avatar

Garrett Baker

9 months ago

Lightcone consistently does quality things.

Garrett Baker salary to study the development of values of RL agents over time
GarretteBaker avatar

Garrett Baker

10 months ago

@Austin Here is the LW post: https://www.lesswrong.com/posts/Bczmi8vjiugDRec7C/what-and-why-developmental-interpretability-of-reinforcement