Udacity Machine Learning Nanodegree – First Impressions
I recently signed up for the Udacity Machine Learning Nanodegree (MLND.) I’ve also decided to make occasional blog posts for myself to keep track of what I’ve learned/worked on, and for anyone that might be considering signing up for the class to help figure out if it’s a good fit for them.
The first big assignment for the class is analyzing a set of data regarding the passengers aboard the Titanic. In short, you go through the process of creating a classifier to predict whether someone would have survived the ship sinking or not. The simplest version of this is “men died, women survived.” The next layer is “men died, women and children survived.” You continue to iterate and train your classifier, and then test it against known outcomes until you are able to predict outcomes with >80% accuracy. To reach this metric, I ended up having to apply a few convoluted filters relating to passenger cabin class, sex, and number of siblings/family members on board.
I enjoyed this project, as it was interesting from a historical perspective and it was fun to work with data relating to an event that is familiar to everyone. I especially enjoyed theorizing and then testing my assumptions. For example, I initially suspected that much wealthier passengers (as indicated by the class of cabin they stayed in) would be far more likely to survive. However, this was not shown by the data. Gender and age seemed to override any significant benefit conveyed by being a wealthier passenger.
This was a great first project and quickly demonstrated how you can apply some rudimentary machine learning to understand data and make simple predictions. I also really appreciated that Udacity provides personalized feedback on your project submissions. While most of the in-lesson content is graded automatically and in real time, project submissions go to something like a college course TA, and you’re provided with a written assessment specific to your submission. In my opinion, this is one of the most significant features of the paid nanodegree program.
The project above seems to be a good framework for understanding the class structure. Each module consists of many lessons covering different aspects of a topic. Modules include ML Foundations, Supervised Learning, Unsupervised Learning, Deep Learning, etc. The lessons cover the theory, then simple applications of the techniques. At the end of each module, there is project that requires applying the techniques learned in the module. I enjoyed this approach in the first module, and am interested to see how it plays out for more challenging topics. I expect there will be areas where it is easy to move through the exercises very quickly, but the module project will require going back and reviewing/referencing previous instruction.
The final, or capstone, project for the course is essentially “go find some data and try to answer a question/solve a problem with it.” This assignment is intentionally open ended, and you’re encouraged to think about what types of problems to which you could apply machine learning for predicting outcomes. I’ve already started a Google Doc where I’m collecting ideas as they come to me, and have ideas ranging from identifying insurance fraud, to predicting weight loss outcomes, to improving the artificial intelligence that controls the enemies in a video game.
One thing I’ve previously found frustrating whenever attempting to learn any new technical skill is that there only seem to be two types of resources.
The first type is written for an absolute beginner. If you’ve ever tried to learn a programming language, you’ve undoubtedly seen many of these. It will start by explaining the concepts of variables, data types, loops, etc. Each concept is introduced and explained in painstaking detail. The problem is that if you know any language, you already conceptually understand these ideas and want to dive into applying them to solving problems. The second type has the opposite problem. It assumes a level of depth and expertise that requires years of previous experience. These resources will immediately leave you feeling like you’re swimming in the deep end of the pool. You’ll read, then reread, a chapter and find it difficult to absorb much of anything. The sweet spot is right in the middle of these two: finding out where the learner is and challenging them so that they keep growing, without getting frustrated and giving up.
So far, I’ve been very impressed with the way the MLND handles this problem. For instance, doing any type of data science or machine learning is going to require a certain amount of familiarity with basic statistics. I took a 200 level stats class in college, but haven’t really thought about it since then. While I still know the basics (mean, median, mode) and could at least vaguely explain variance or standard deviation, these topics are not fresh in my mind. The MLND coursework includes a quick refresher on Statistics. In total, I probably spent about 2 hours working through some tedious manual calculations, then creating a spreadsheet to easily computer variance and standard deviation from a set of values. This was far from a full semester-long stats class. However, it provided me with enough familiarity to be productive, and I could always use additional reference materials if needed. In fact, the class links out to other full courses on statistics online. This is useful, but seems like a long (and given the completion-time based pricing, potentially expensive) detour to take in the middle of the class. I address this below in the Recommendations section.
The course also offers membership to a Slack team (appropriately named MLND.) This is exactly what you’d expect: a place to discuss issues/problems with the coursework, generally discuss machine learning topics and news, and perhaps do a bit of professional networking. This seems to be a fairly valuable resource, while not costing Udacity much at all to offer, since it is largely community-driven. Several of the most active people on the chat are former students that help others along through the coursework.
Udacity really emphasizes the career resource portions of this program. In fact, they offer a $299/month version of the program with guaranteed job placement or a full refund is granted. I haven’t explored the career sections extensively yet, and I likely won’t be spending much of my time here (for reasons covered in the next section.) However, I’ll generally say that if you are trying to make a career change or need guidance in this area, this program appears to be strong in the career guidance domain. I’ll provide more thoughts on those in future posts on the MLND if I dig into this material more.
What am I paying for?
An important question to consider when signing up for something like a Nanodegree is “What am I paying for?” Being specific about this will make it much easier to determine if it was worth the money in the first place, as you’ll be able to evaluate the results against your initial goal. In my case, I’m paying for accountability. The MLND is like my own data science/programming bootcamp. Similar to working with a personal trainer at a gym, I’ve been assigned a mentor and I communicate weekly learning goals to him. This makes me accountable to a third party, which can be very effective for increasing compliance. In this case compliance equates to regularly working on the course materials and making tangible progress.
Moreover, I’ve signed up and put down my $199/month to get some figurative skin in the game. Nanodegrees generally don’t have a fixed schedule and allow you to work at your own pace. This means you have a financial incentive to be diligent and move quickly. If you complete the program in 2 months, rather than the estimated 6 months, you’ve saved $796 on tuition fees. However, there’s an obvious balance to be struck between saving money on tuition and racing through the exercises without giving yourself time to absorb the material in a way that will allow you to apply it in the future beyond the structured exercises of the course.
It’s important to emphasize that I’ve asked what am I paying for. Other students will likely have different motivations than I do. For example, Udacity emphasizes their ties to impressive tech brands such as Facebook and Google. For someone looking to get into the hiring pipeline at one of these companies, the Nanodegree might be a great starting point. For me, however, this is less important. I’ve previously worked at Google and currently work at Uber. I’m happy in my current role and doing this for personal growth/development, rather than trying to jumpstart or pivot my career. Given my background and the strengths of companies already on my resume, it’s been my experience that I can at least get my foot in the door for a position if my candidacy for the role makes any sense at all.
I’m enjoying the MLND program, but there are some things I would recommend to anyone considering signing up for it (or any nanodegree.) As mentioned previously, these nanodegree programs tend to be priced based on time to complete. You pay a fee each month until you’ve “graduated” from the class. However, many of the components of the classes are available free to individuals not enrolled in the program. In fact, when I first signed into my account after enrolling, I’d already previously completed ~20% of the Unsupervised Learning section on my own. You can complete a significant amount of work before signing up and paying anything for the class. Is this a good idea? It’s hard to say.
The MLND definitely provides a better structure and some strong guard rails to keep you on track in the course. In my case, it’s also helping me with accountability and keeping me moving down the path. However, I’d expect that if you were highly motivated on your own, and wanted to save some money, you could knock out a large portion of the course content for free, and then enroll and go through the other material in a fraction of the time. The decision of whether or not to approach the program this way depends on personal factors such as comfort with the price of the program, personal level of motivation/engagement, and ability to deal with less guidance.
Given my experiences working on and off of the MLND track, I’d probably recommend enrolling in the class and trying to dedicate large blocks of time to it each week in order to complete the full course in 2-3 months. I do think it would be worth completing any prerequisites prior to enrolling. For instance, if you know you’re extremely rusty in statistics, work through a book or complete an online course before signing up for the nanodegree. Spending 2-4 weeks paying for the program while doing supplemental assignments to fill in gaps seems unwise to me.