shive.ly

a human learning about how machines learn

I'm taking a short detour (or more likely, a parallel path) from my work on Claude and other agentic coding tools to dig into deep learning concepts. I've heard good things about Grokking Deep Learning by Andrew Trask as a foundational text, so I'm starting there. In particular, I like a few things about the book so far:

1. Limited math skills required

It's easy to get overwhelmed trying to refresh yourself on linear algebra, matrices, and other things I only sort of remember from high school and college. This book focuses on explaining the core concepts in an accessible way, without focusing too much on the complex math. I actually took a refresher linear algebra course online a few months ago, but I found that focusing too much on the math (and not enough on building stuff) was really taking the wind out of my sails.

2. Build using foundational Python components, rather than diving straight into libraries like Pytorch

While most real work in the field is done using Pytorch or other sophisticated libraries, it is easy to let the library do all of the heavy lifting and not fully grasp what is happening under the hood. By building using basic data structures (first lists, then NumPy arrays), it helps strip away the magic and reinforce what is actually happening.

3. Project-driven work in Jupyter Notebooks

The book is structured around discrete exercises that can be written, run, and experimented with in Jupyter notebooks. This lowers the friction and makes playing with the results easy and engaging.

I used chatgpt to scaffold a notebook for each chapter, where I can reproduce the code samples, make notes for myself to make sure I'm really nailing down the concepts, etc. I'll share my progress as I go with occasional posts, and in this repo.

I didn't get quite as much done today as I had hoped, but I'm not quite “off track”, either.

Currently, I'm working through 3 courses:

  • A synchronous Data Structures & Algorithms class through UC Berkeley Extension
  • An asynchronous Linear Algebra class through Georgia Tech, offered via EdX
  • Harvard's CS50AI course — An introduction to programming AI algorithms in Python

DS&A

The DS&A class is definitely the highest priority, because it has a set class meeting day/time, real deadlines, and cost around $1000 to take since it is for-credit. So far, it is going well and I'm enjoying working on the coding assignments.

I've been using Google Colab to do the assignments in Python notebooks, which has been a really nice workflow and allows me to pick up the assignment at random times regardless of whether I'm on my work or personal laptop.

Linear Algebra

This class is kind of a slog. I do feel like I'm progressing and it is starting to click, and I know that linear algebra is a big part of the math underpinning artificial intelligence programming, but it still feels very abstract when I'm working on the assignments. Maybe once I start to apply it to the AI domain after this class it will feel more engaging.

CS50AI

Overall I think this class is great. The lectures are interesting and very well-produced. The one criticism I have is that generally the lectures and examples given feel very straight forward, and then the assignments feel extremely difficult. I guess this was a common thing when I took classes in undergrad as well.

ChatGPT & Ultralearning

I've been using ChatGPT to tutor myself in all of these classes. In particular, it is great for the data structures assignments because it can understand a Jupyter Notebook so I can upload my homework and get feedback on it.

I've been prompting it to give me feedback using the socratic method, and to never explicitly give me the answer to a problem unless I ask for it directly. It will ask me questions and help me unpack my thinking, and generally help me reason/explore my way to a solution.

I've also been trying to use some of the techniques from the book Ultralearning to both accelerate my learning pace and increase retention. One of the main ideas of the book is: If you're trying to learn something new, just re-reading your notes doesn't really work that well. Instead, you should read or watch a lecture, take a reasonable amount of time away from the material, and then sit down and try to write out as much of the material as you can recall. Definitions, theorems, formulas, etc. This is super effective, as recall/retrieval significantly strengthen memory.

The problem is it is hard, and in the moment feels like you aren't progressing.

Reading over your notes feels productive, but actually isn't. Sitting down with a blank sheet of paper and writing what you can remember feels like a struggle, and like you're getting your ass kicked by the material.

But it works.