26 June 2019

Machine Learning Starter Kit

A lot of people ask me how to get started with machine learning/artificial intelligence, so took what I usually say and wrote it down in one place!

Many people have spent many hours writing down and illustrating how to get started with machine learning, so this will mostly be links to them. Also both the community and this post often use ML/AI (machine learning/artificial intelligence) interchangeably. Deep learning (DL) is ML at scale, typically referring to neural networks.

Getting started

Most ML material nowadays is in Python (even if it calls out to C++ or other more performant code) so being comfortable with Python code is a must.

All of the algorithms and ideas used today in ML are built using linear algebra, calculus, differential equations, and/or probability/statistics. There are a lot of great materials online for these including but not limited to:

Paul's Online Notes
entire university courses like those on MIT Open CourseWare
YouTube channels, I particularly love 3Blue1Brown because of their great use of visualizations to explain concepts (one example being their matrix determinant video)

One excellent resource to get started with machine learning is the Google ML crash course: developers.google.com/machine-learning/crash-course. This is a comprehensive introduction to many basic machine learning concepts you should be familiar with, in addition to more modern deep learning techniques like neural networks.

There's also neuralnetworksanddeeplearning.com which nicely builds up how neural networks work. It also includes some useful tricks people use to get neural nets to work well.

Going further

Researchers are more frequently running into the issue of cramming their interactive results into a static text PDF format. Some got together and made distill.pub, which is a beautiful site with interactive posts that explain technical details of machine learning.

As I mentioned before, more and more university classes are putting their content online for free. This includes some advanced machine learning classes, such as those at Stanford:

These sites have course outlines, lecture slides, videos, homework assignments, and suggested readings which you may find useful.

Additionally, there is also the "Deep Learning" book available for free at deeplearningbook.org. This covers a comprehensive list of important topics and dives into the mathematical and theoretical background behind deep learning.

Social media

Something that has had a noticeably positive effect on my ML career has been making a professional ML Twitter account. The ML community is quite active on Twitter, and they often post papers/projects/ideas that they are working on or find interesting. It's a great format for getting a pulse of the field, although I do have some words of caution: it is quite easy to get FOMO or feel overwhelmed because the field moves so fast, but you just have to remember that people only post when their work is ready for the public eye. You don't see the other 99.9% of the effort that went into making the finished product, and rarely do people publish negative results (although I believe this should be fixed!) As a shameless plug here is my ML Twitter account, I recommend going through the people I follow and following those whose feeds you find interesting.

Reading papers

Other powerful resources you'll soon come across are pre-print servers, namely arXiv (pronounced archive) and OpenReview (OpenReview for the ICLR 2019 conference.) These are websites where anyone can upload a PDF to share with the community. Often times you'll see papers that are published in conferences here, and many that are not; keep in mind that while most of the time people upload with care, the results in papers that have not been published in conferences may not have been peer reviewed to the same degree as others.

When reading machine learning papers, they often follow a common format. There are no rules around this and definitely not all papers adhere to this format, but many do:

Introduction: a general framing of the problem/methods discussed in the paper. Usually outlines the importance of the work by giving real world examples of problems it solves. Often times explicitly lists the contributions of the paper.
Background/Related Works: an overview of related previous work done in the area of the paper. Very useful if you are looking to get into a subfield and need more information to read! An example of an extremely in depth (and longer than average) related work section is Section 3 of Measuring the Effects of Data Parallelism on Neural Network Training
Methods: details about the contributions of the work, sometimes split into multiple sections.
Experiments/Results: details about the results of the paper, sometimes split into multiple sections. Often this is where the important result figures in a paper are.
Discussion: a recap of the significance of the methods and results of the work. Sometimes authors will list what they believe are promising directions of future work, so if you are looking for ideas to work on (or looking for how to think of ideas to work on) this can be useful.
Appendix: extra information/figures that didn't fit into the previous sections within a reasonable (sometimes conference enforced) page limit. If you are looking to reproduce a paper this is often the place the authors include the fine details of their experiment setups. An excellent example of appendices is the BigGAN paper; I especially like their negative results appendix where they discuss the things they tried that did not work. Including these negative experiments is immensely useful for researchers looking to replicate and expand on your work!

Another example where you can see all these sections is in our recent paper on uncertainty calibration in deep learning; take a look at the PDF and you'll see all the pieces mentioned!

Trying ML yourself

One pain-free way to start programming with ML is to use Google Colab, which is a Python notebook where the code runs on Google Cloud so you don't have to install packages/drivers yourself. You can even get a free GPU or TPU with Colab! TPUs are Google's new hardware built specifically for speeding up machine learning. You can run on a GPU by selecting a GPU runtime by going to "Runtime" > "Change runtime type", and here are some Colab TPU examples.

Interesting applications

If the previous info didn't pique your interest then check out some of these recent applications of ML I think are interesting:

OpenAI Five: OpenAI trained an AI to play Dota 2 at a professional level (in general the OpenAI blog has some great posts!)
Project Magenta: a Google Research group working on AI for creative applications such as writing music and art.
BigGAN interactive Colab: a Google Colab where you can play with a copy of one of the currently state of the art image generation models to make realistic and/or trippy pictures.
GPT-2: a language model trained by OpenAI that generates impressively realistic stories (albeit ungrounded from reality.) They are so realistic that they chose not to immediately release the pre-trained model due to concerns of abuse.
TimbreTron: a model that given an audio clip can generate the same clip with a different timbre, as if played by another instrument.
OpenAI Gym: a framework for reinforcement learning (RL) agents to compete in, on various games and levels (you can see the best performing models on their leaderboard.)

Parting advice

If/when you come up with ideas you want to try, search for related projects first. There are a lot of people in the field and it's growing every day, so it is not unlikely that someone already explored things you are thinking about. If they have, don't be upset; it happens all the time and if anything is validation that your idea is a good one! Plus there are always plenty more ideas to try.
The ML/AI field moves very fast and can be hard to keep up with, so I recommend first exploring the different subjects of ML and then trying to focus on only a few. Many areas are quite interesting though and it can be hard to narrow down your scope, so I have found that sometimes it is easier to select subfields that you are not interested in and temporarily ignore them, then catch up only when you need to, and focus on the remaining areas.
Keep a list somewhere (digital or physical) and write down any idea that comes to you, no matter how silly it seems at the time.

I hope this helps you get started on your ML path, feel free to reach out on Twitter with any questions!