A Docker configuration for machine learning

Why Docker for machine learning?

  • Portability: Move your ML environment easily from AWS to Google Cloud to Azure to your laptop to corporate cloud.
  • Reproduceability: Save the image of the Docker container that ran a set of research experiments in a repository, and as long as the repo, Docker and suitable hardware are around, you can reproduce the environment and experiments exactly.
  • Production integration: Modern cloud-based applications are increasingly loosely coupled microservices in containers. Take an ML model container, add a Flask or FastAPI REST API, and you have a (nearly) production-ready microservice, or at least one in a package the production team can easily understand and integrate.
  • Cookiecutter templates: Using the cookiecutter package and a cookiecutter data science template, everyone on your team can use similar standard environments and easily share and understand each other’s work, and use a common template for the docker setup.

Once you’ve captured your machine learning infrastructure as code, you can save it, move it around and reproduce many copies of complex environments with one or two commands.

This is set up to run on an instance with 50GB of disk space and GPU. Caveat: I’m a Docker neophyte, this may not follow best practices, I am trying to install everything under the sun I may need, tested only on Ubuntu, YMMV.

1) Install Docker per website docs, TL;DR:

a) Install Docker prerequisites

sudo apt-get -y install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common 

b) Add GPG key and repository to apt

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

c) Install

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

d) Add user to Docker group (log in and out to make changes take effect)

sudo usermod -aG docker ${USER}

e) Test Docker

docker run hello-world

2) Install nvidia-docker for GPU support (skip if you don’t have/need Nvidia GPU support) Before installing, run nvidia-smi to make sure you have OS GPU support / drivers. See Nvidia for more information.

a) Add Nvidia apt key

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

b) Add apt repos

distrib=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -sL https://nvidia.github.io/nvidia-docker/$distrib/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

c) Install nvidia-docker2

sudo apt install -y nvidia-docker2

d) Restart Docker

sudo pkill -SIGHUP dockerd

e) Test container, should see GPU (takes a minute, downloads a couple of gigs)

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

3) Build the docker_ml container

  • Clone this repo. cd to the top-level directory containing Dockerfile
  • If you don’t want GPU support, edit the Dockerfile
    • comment out FROM nvidia/cuda
    • uncomment FROM ubuntu
  • Build the container (this will take a long long time (~20 minutes) and > 10GB of disk space)
docker build -t docker_ml .

4) Run the container

Once built it launches in seconds.

  • For Jupyter:

    docker run --runtime=nvidia --name docker_ml -p 8888:8888 -v "$PWD:/home/ubuntu/local" --rm docker_ml
    
    • Current working directory on the host will be mounted as /home/ubuntu/local in the container, edit as necessary for access to your notebooks

    • Connect to the Jupyter server on port 8888 using https:// . Ignore warnings about the self-signed https certificate. If you are connecting to a remote server via a firewall, make sure that port allows incoming traffic from your IP.

    • Password will be root

    • To check for functioning GPU, create a new Python notebook and say

      import tensorflow as tf
      print(tf.__version__)
      print(tf.config.list_physical_devices('GPU'))
          
      2.1.0
      [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
      
  • For a command line:

docker run --runtime=nvidia --name docker_ml -p 8888:8888 -v "$PWD:/home/ubuntu/local" -it --rm docker_ml bash
  • Docker is beyond the scope of this document. See for example:

    • https://docker-curriculum.com/
    • http://dockerlabs.collabnix.com/docker/cheatsheet/
    • https://dockerbook.com/
    • Neverthless. a few commands to get started:
    # show containers
    docker ps -a
    # show images
    docker image ls
    # stop a running container
    docker stop <container_id>
    # remove a container
    docker rm <container_id>
    # remove an image
    docker image rm <container_id>
    # clean up images to save disk space
    docker image prune --all
    # clean up containers
    docker container prune