Prerequisites#

This is a guide to set up a local development environment for this course.

Python#

You first need to set up a Python environment (if you do not have done so already). The easiest way to do this is by installing Miniconda, which will install Python as well as a set of commonly used packages. We will be using Python 3, so be sure to install the right version. Always install a 64-bit installer (if your machine supports it), and we recommend using Python 3.10 or later.

If you are completely new to Python, we recommend reading the Python Data Science Handbook or taking an introductory online course, such as the Definite Guide to Python, the Whirlwind Tour of Python, or this Python Course. If you like a step-by-step approach, try the DataCamp Intro to Python for Data Science.

To practice your skills, try some Hackerrank challenges.

OS specific notes#

Windows users: If you are new to Anaconda, read the starting guide. You’ll probably use the Anaconda Prompt to run any commands or to start Jupyter Lab.

Mac users: You’ll probably use your terminal to run any commands or to start Jupyter Lab. Make sure that you have Command Line tools installed. If not, run xcode-select --install. You won’t need a full XCode installation.

All: Install the correct version of graphviz according to your OS.

Apple silicon (M1/M2)#

For those who have a laptop with Apple Silicon (M1), this guide may be useful to install a TensorFlow version that will effectively use the GPUs.

This procedure has been known to work using Miniconda3 and Python 3.10 on M1 chips:

cd to your miniconda directory
conda install -c conda-forge cvxpy
pip install "tensorflow"
pip install "tensorflow-metal"

The conda install of cvxpy is to resolve issues with libraries with poor M1 support (e.g. fancyimpute)

Virtual environments#

If you already have a custom Python environment set up, possibly using a different Python version, we highly recommend to set up a virtual environment to avoid interference with other projects and classes. This is not strictly needed if you use a fresh Anaconda install, since that will automatically create a new environment on installation.

Using conda#

To create a new conda environment called ‘mlcourse’ (or whatever you like), run

conda create -n mlcourse python=3.10

You activate the environment with conda activate mlcourse and deacticate it with conda deactivate.

Using virtualenv#

To can also use venv if you prefer:

pip install virtualenv
virtualenv mlcourse

Activate the environment with source mlcourse/bin/activate or mlcourse\Scripts\activate on Windows. To deactivate the virtual environment, type deactivate.

Installing TensorFlow#

To install TensorFlow 2 (if you haven’t already), follow these instructions for your OS (Windows, Mac, Ubuntu). For Apple M1 machines, see the procedure above under ‘OS specific notes’. While installation with conda is possible, they recommend to install with pip, even with an Anaconda setup. We recommend using TensorFlow 2.7 or later.

Course materials on GitHub#

The course materials are available on GitHub, so that you can easily pull (download) the latest updates. We recommend installing git (if you haven’t already), and then ‘clone’ the repository from the command line (you can also use a GUI)

git clone https://github.com/ML-course/master.git

To download updates, run git pull

For more details on using git, see the GitHub 10-minute tutorial and Git for Ages 4 and up. We’ll use git extensively in the course (e.g., to submit assignments).

Alternatively, you can download the course as a .zip file. Click ‘Code’ and then ‘Download ZIP’. Or, download individual files with right-click -> Save Link As…

Installing required packages#

Next, you’ll need to install several packages that we’ll be using extensively in this course, using pip (the Python Package index).
Run the following from the folder where you cloned (or downloaded) the course, or adjust the path to the requirements.txt file:

pip install --upgrade pip
pip install -U -r requirements.txt

Note: the -U option updates all packages, should you have older versions already installed.

Running the course notebooks#

As our coding environment, we’ll be using Jupyter notebooks. They interleave documentation (in markdown) with executable Python code, and they run in your browser. That means that you can easily edit and re-run all the code in this course. If you are new to notebooks, take this quick tutorial, or this more detailed one. Optionally, for a more in-depth coverage, try the DataCamp tutorial.

Run jupyter lab from the folder where you have downloaded (or cloned) the course materials, using the Python environment you created above.

jupyter lab

A browser window should open with all course materials. Open one of the chapters and check if you can execute all code by clicking Cell > Run all. You can shut down the notebook by typing CTRL-C in your terminal.

An alternative: Google Colab#

Google Colab allows you to run notebooks in your browser without any local installation. It also provides (limited) GPU resources. It is a useful alternative in case you encounter issues with your local installation or don’t have it available, or to easily use GPUs.

The course overview page has buttons to launch all materials in Colab (or Binder), or you can upload the notebooks to Colab yourself.