And thank you also to your partner!
@briantan
Co-Founder of WhiteBox Research
https://www.linkedin.com/in/brianctan/$0 in pending offers
Brian Tan
5 months ago
Thank you so much to everyone who has directed funding to WhiteBox so far! @tfburns @isaacdunn @sanyero @Llentati @TonyGao @zmavli @madonnadane @kej.
We still haven’t received funding for Cohort 2, so this funding is much needed and appreciated. We will likely receive a decision from the Long-Term Future Fund within the next two weeks, so we hope that goes well. Our fellowship’s first cohort is about to end in two weeks, and we hope to share more good outcomes over the next few months!
Brian Tan
9 months ago
To add to our last update, here are some additional updates over the last four weeks:
We’ve further clarified our strategy and plans for the next year.
Our plan for the next year is to run two cohorts of our 5-month, part-time AI Interpretability Fellowship in Manila to produce junior mechanistic interpretability researchers. We’ve created a Theory of Change for the fellowship here.
Once we’ve completed two rounds, we plan to open our doors to those in Southeast Asia for the third round of our fellowship. The third round will likely be a full-time, 1-2 month version of our fellowship (e.g., starting June 2025).
Through our fellowships, we aim to kickstart and develop the AI safety community in Manila and Southeast Asia.
We also have these updated goals for our fellowship’s 1st cohort:
Have our fellows solve or make substantial progress on a handful of concrete, open MechInterp problems (e.g., those in Neel Nanda’s list) that have not been solved yet by the end of September 2024
Get at least one fellow to be counterfactually accepted by the end of 2024 into a full-time AI safety research fellowship (e.g., MATS’s Winter 2024-25 program)
Have at least four fellows spending at least 10 hrs/week working on alignment-oriented upskilling and/or interpretability projects by the end of 2024
Unfortunately, our team member Kriz Tahimic left due to health issues. We are grateful for Kriz’s help in co-founding and launching WhiteBox with us. Given his departure, we’ve increased Kyle Reynoso’s responsibilities and extended Kyle’s contract to work with us until August at 0.5 FTE (and past August once we get more funding).
We’re currently fundraising for $92,300 to fund us until March 2025. (Our current funding will only last us until July or August.) The $92,300 would fund:
Additional operations costs for cohort 1, such as mentor and fellow stipends ($5,100)
Our 2nd cohort from late September 2024 to March 2025 ($87,200)
If you’re interested in funding or donating to us, you can contact me at brian@whiteboxresearch.org. We can send you our fundraising proposal and information on how to donate.
There are three main goals we want to achieve by August:
Conclude our Trials phase (training) with our planned in-house interpretability hackathon and shave off its remaining warts and inefficiencies for the next cohort
Have our fellows complete research excursions on selected problems in Neel Nanda’s list of concrete open problems (COPs), under the guidance of experienced external mentors
Fundraise enough money to fund our 1st and 2nd cohorts, as mentioned above
As shown in our Theory of Change, we will focus on having our fellows work on the COPs so they can upskill in interpretability research rapidly. However, we’re open to other proposals from mentors if there are adjacent problems that our fellows can help them with, so long as they: a) can practice MechInterp fundamentals in those projects, and b) can realistically complete the project by the end of the Proving Ground.
We’re also open to such proposals from our more advanced fellows, following the same constraints as above, and if we and the available mentors deem them viable. This is because promising researchers often have very strong opinions on what they wish to work on, and this can make them more motivated to complete the rest of the fellowship.
Note also that this is not a bet on the COPs being vital to alignment, nor do we expect our fellows to produce immediately useful research by the end of the program: they are and will still be new to the field after all. Rather, we hope the problems will serve as excellent forcing functions for our fellows to get better at the fundamentals of MechInterp as quickly as possible.
We are still looking for 2 to 4 more research mentors with experience in mechanistic (or general) interpretability research for our Proving Ground (research phase) from June to August. Said mentors just have to meet with 1-2 fellows weekly virtually for ~45 minutes to provide research guidance.
They can choose to oversee more than one person or duo. As mentioned above, we are also open to having our fellows help their mentors in some MechInterp-adjacent task. For example, mentees can resolve accessible open issues in an existing interpretability project in exchange for the mentor-mentee relationship, as long as they’re properly scoped to fit in our Proving Ground phase. If you are interested in being a mentor for our fellowship, please contact us at team@whiteboxresearch.org.
If you’re interested in or have experience in mechanistic (or general) interpretability, you can join our Discord server here and engage with people in our community, including our fellowship participants.
As mentioned, if you’re interested in funding us, you can contact me at brian@whiteboxresearch.org!
Brian Tan
10 months ago
We at WhiteBox Research have been progressing well since we got our initial regrant from Renan Araujo, and we’d like to share some updates on our progress below! (We will have a strategy session this week, and we’ll share another update within the next two weeks about our next steps and how others can help us.)
Here are our key achievements since September 2023 (up to March 19, 2024):
In November, we were approved $61,460 in funding from the Long-Term Future Fund! Together with our funding from Manifund, this funds us until around August 2024.
We finalized more details of our program and named it the WhiteBox AI Interpretability Fellowship. It’s a five-month training and research program in Manila to master the fundamentals of mechanistic interpretability. We created this primer for the fellowship, and our training phase’s curriculum overview can be found here. [1]
We got 53 applicants and accepted 13 participants into our fellowship, surpassing our goal of getting 50 applicants and 12 participants. [2] [3] Our marketing and application process also helped us start building a wider community of people interested in AI interpretability. [4]
We onboarded Kyle Reynoso as a part-time teaching assistant in February, and he has contributed significantly since then. [5]
We ironed out a process for how participants can view, submit, and receive feedback on their exercise answers more seamlessly via GitHub Classroom and nbgrader.
We’re in the fourth week of our fellowship’s two-month training phase. So far, we’ve received generally positive feedback on the three Saturday sessions and the two-night retreat we held for participants, and we’ve maintained a consistent weekly tempo of adjustments and improvements to various aspects of the program.
Here are some footnotes to expound on our progress above:
[1] Since we opted for less experienced but high-potential participants (their average age is 21), we would probably have to cover more of the prerequisites than other programs (e.g., ARENA), which means we may only delve more into interpretability in the research phase of our program in June.
[2] We opted for a three-stage application process. Stage 1 involved solving Bongard problems and answering essay questions about alignment material (namely Ajeya Cotra’s Saint/Schemer/Sycophant post and Lee Sharkey’s Circumventing Interpretability post). Stage 2 tested their ability to solve coding problems that are tricky to solve even with GPT-4, and Stage 3 consisted of an unstructured interview largely based on the format of the insightful podcast Conversations with Tyler (Tyler Cowen).
[3] Some of the 13 people we accepted include an IOI silver medalist, a 19-year-old who recently got seed funding for his B2B startup, a fluent Lojban speaker who did contract work for an OpenPhil AI safety grantee, and a Master's student who won a gold medal in Taiwan for chemistry research she did in high school.
[4] We spent around a month marketing and community building to attract people to apply for current and/or future cohorts of our fellowship. We ran a successful virtual “Career Planning in the Age of AI” salon at the start of the year with around 27 attendees, and four people whom we ended up accepting joined it. We also started a community Discord server where people from the ambient community can interact with and discuss all sorts of questions with our participants, as a preliminary step towards wider community building in Southeast Asia. (We sent an invite to our server to all applicants, including those we rejected, some of whom already have more background in ML.)
[5] Our TA, Kyle Reynoso, graduated with the highest honors as a CS major in the top university in the country, was an officer in a local student EA chapter, and has a day job as an ML engineer for a marketing compliance AI startup in Australia.
Brian Tan
over 1 year ago
Sorry for the late reply here, but thanks Austin! (Oh and to clarify, Clark set up the prediction market, not Kriz!)
Brian Tan
over 1 year ago
Hi Renan, we really appreciate your decision to give us a regrant! Thanks also for sharing your thoughts about our project. We're taking your challenges/concerns into account, and we're already coming up with concrete plans to mitigate them.
Brian Tan
over 1 year ago
Hi Gaurav, thanks for weighing in on our project! Here are our thoughts on what you said, written mainly by Clark:
We agree there’s value in visiting Berkeley if people had the means, but we think it’s important there be more alignment hubs in various regions. We think that a good number of potential AIS researchers in Southeast Asia would find it costly and/or hard to visit or move to Berkeley (especially in the current funding landscape), as compared to visiting or working in Manila / SE Asia.
On research sprints to solve COPs: there are nuances to speed. Optimising for paper writing speed for example doesn't make sense, nor would treating the problems as Leetcode puzzles you can grind. The kind of speed we're optimizing for is closer to rate of exploration: how can we reduce our key uncertainties in a topic as quickly as possible? Can we discover all the mistakes and dead-ends ASAP to crystallize the topic's boundaries rapidly? Can we factor the open question into two dozen subquestions, each clearly doable in one sitting, and if so, how many of them can we do in a given timeframe? The crucial point is this: moving around produces information. We want to ruminate on questions in the middle of coding them up, develop the habit of thinking through problems in the space of a Jupyter notebook, and shrink this loop until it becomes second-nature. We have also emailed Neel Nanda and Joseph Bloom about our project and aim to get their advice, so we won't veer too far off course while still learning to walk on our own.
On mentorship, we expect to do well enough in the training phase, but we likely need more mentorship in the research phase. That's why we're going to get a research adviser. During the research phase, the students will (mostly) get advice from Clark and Kriz, while we take advice from a research adviser. The goal is eventually to train ourselves and/or get enough people on our team so that we can confidently do the advising ourselves. This is also why we're adopting the flipped classroom model: we'll only have to produce/curate the learning materials once, and then just focus on getting them to do exercises. We're quite confident this is doable as Clark has taught classes of more than 40 people before.
Let us know if you have more thoughts or questions!