From Curiosity to Research Pt. 2 - The curriculum I've built to explore mechanistic interpretability

So in part 1, I covered the initial curiosity about neural networks and how they work, and why I wanted to dig deeper. In this post, I'll outline the curriculum I've built for myself to explore mechanistic interpretability in neural networks.

The Goal Behind the Curriculum

The primary goal of my curriculum is to develop as comprehensive of an understanding of mechanistic interpretability in neural networks as possible. I want to decently grasp both the theoretical and the practical aspects. This means diving deep into the math, the code, and the real-world applications of these concepts.

The hope is that I'm able to meaningfully engage in and contribute to the field of mechanistic interpretability itself, and that I can use this knowledge to help others in their own journeys.

Is There a Timeline?

I don't have a specific timeline for my learning journey. What I do know is that I'm planning to spend at least 1-2 hours per day studying and practicing concepts that are core to the field. Right now this means strictly self-study by pulling resources from various online platforms, including research papers, tutorials, and videos.

Eventually, I plan on applying to get a master's in CS with a focus on machine learning and AI, where I hope to deepen my understanding and fill in gaps with a more formal education. The other benefit of this is gaining access to an alumni network that is plugged into the machine learning research community more broadly, always be networking!

The Curriculum Overview

My curriculum is designed to be flexible and adaptive, allowing me to explore different areas of mechanistic interpretability as I progress. Here's a high-level overview of the key components:

Base Neural Network and Deep Learning Knowledge: What's going on here is getting a solid overview of Deep learning, Neural Networks, and the Transformer Architecture overall. The goal is just getting a "lay of the land", not digging into the weeds too much.
Here are some resources I've used to cover this:
- 3Blue 1 Brown's YouTube series on Neural Networks: link
- Welch Lab's YouTube series on Neural Networks: link
- Andrej Karpathy's YouTube intro video on Neural Networks and Backpropagation: link
Mathematics: A solid understanding of the mathematical concepts underlying machine learning and neural networks is pretty important. Simply put, you can't avoid the math here if you want to truly grasp how these models work and the ongoing research and work that goes into them.
Here's what I'm using to get caught up and refresh myself on both Linear Algebra, Calculus, and Statistics/Probabilities:
- 3Blue1Brown's YouTube series on Linear Algebra: link
- 3Blue1Brown's YouTube series on Calculus: link
- StatQuest's YouTube series on Statistics: link
- When I want to fill in some gaps later on, I'll probably check out Khan Academy as well.
Libraries and Frameworks: Familiarizing myself with key libraries and frameworks used in machine learning and building LLMs is going to be key in getting equipped with the right tools to do meaningful work here. This includes tools like TensorFlow, PyTorch, NumPy and other popular libraries.
- For Python, I'm using Boot.dev to catch up on more nuanced Python concepts and best practices since I'm fairly familiar with Python already. I loved their Golang course, so I figured I'd appreciate what they have for Python as well.
- For a deep dive into the application and mechanics of an LLM through code, I've already started reading through Sebastian Raschka's Build a Large Language Model(From Scratch)
- After that, I'll be digging into PyTorch as my deep learning framework of choice. It's been recommended by several sources for its flexibility, ease of use, and popularity among researchers, and it's used throughout the previous book. I'm going to follow Deep Learning with PyTorch Step by Step: A Beginner's Guide to solidify my understanding.
- Finally I'll be exploring the einops Python library as it's been recommended to work with tensors (the building blocks of deep learning in code) in combination with the PyTorch library. Here's a link to their docs: einops documentation
Deeper Understanding of Transformers: Here I'll be focusing more on the architecture and functions of transformers. This is just a deeper dive into what makes LLMs in particular tick. Having a good grasp of them inside and out is probably the most important piece here, and with the math under my belt and the tools to implement them, I should be able to better navigate transformers in-depth.
- The Illustrated Transformer by Jay Alammar is a great resource for visual learners.
- Transformers from Scratch by Peter Bloem is a great high level explainer as well.
- Neel Nanda's Transformer Explainer (2-parter) is a great video resource for understanding transformers from one of the leading researchers in the field. He provides not only a great overview, but accompanying code as well to walk through. Highly recommend reading through the video description of the first video, he provides a ton of follow up resources to learn more.
- Callum McDougall's ARENA program that covers all the topics needed to get start in AI safety technical research is pretty comprehensive. I plan to start on chapter 0 and go through the entire thing, but if you want to focus on the mechanistic interpretability portion, that's in chapter 1, but I'd at least do chapter 0 and chapter 1 just so you aren't lost.
Hands-On Projects and Community Engagement: At this point I'm hopefully at a place where I can start applying what I've learned to open problems in the mechanistic interpretability space. This could involve collaborating with others in the field by participating in open source research discussions, contributing to relevant open source projects, or working for organizations focused on AI safety and interpretability. In a perfect world, I'd be doing all three.

Time will tell though, and I'm not rushing anything either. I fully understand that getting the fundamentals down are arguably the most important part of this whole process, and I want to make sure I have a solid grasp of them before diving into hands-on work.

As I come across more resources and opportunities, I'll be sure to take my time and integrate them into my learning process and share them on this blog in case other find them helpful.

Thanks for reading.