Lab 04 - Docker + VSCode

Dockerization + VSCode

IRiM and Fossbot4AI logos

1. Activity Identity

Activity title	Introduction to Robotics
Topic	Docker / DevOps / IDE
Authors	Institute of Robotics and Machine Intelligence Dominik Belter, Jakub Chudziński, Marcin Czajka, Kamil Młodzikowski
Target learners	Bachelor (Computer Science / IT, Robotics)
Estimated duration	1.5 hour
Difficulty level	Beginner
FOSSBot environment	Linux workstation
Licence	CC BY 4.0

2. Learning Objectives and Competences

ID	Learning outcome	Related competences	Assessment evidence
LO1	Students will be able to pull images and start, inspect, enter and stop containers using basic Docker CLI commands (`pull`, `run`, `ps`, `exec`, `stop`, `rm`).	Knowledge of containerisation tools; selecting programming tools	Screenshot of `docker ps` and `curl` against a running container (Submission item 1)
LO2	Students will be able to write a `Dockerfile` and build a custom image that packages a Python application together with its dependencies.	Selecting programming tools; using libraries for designing robot software components	Screenshot of the built image (Submission item 2)
LO3	Students will be able to use `docker-compose` to run a multi-service setup and use VSCode Dev Containers to develop inside a container.	Selecting programming tools; integrating tooling for robot software development	Screenshots of `docker compose ps -a` and the VSCode dev container (Submission items 3 and 4)

3. Prerequisites

A workstation running Linux with a working network connection.
Basic computer literacy: comfortable using a keyboard and mouse, opening applications, capturing screenshots.
Basic terminal skills (Lab 1 covers everything you need).

4. Required Material and Setup

Category	Item	Version / Quantity	Notes
Hardware	Workstation	1 per student	Any Linux PC.
Software	Docker Engine	pre-installed on the lab workstations	Lab 4 assumes you can run `docker` without `sudo`.
Software	VSCode + Dev Containers extension	pre-installed on the lab workstations	The extension is published as `ms-vscode-remote.remote-containers`.
Software	`git`	bundled with most Linux distributions	Used to clone the starter repository.
Starter code	`fossbot-text-to-cmd`	from GitHub	Contains the application you will containerise. Pull a fresh clone in Step 1.
Hardware	NVIDIA GPU + container toolkit (optional)	only used in Step 7	Required only for the GPU bonus step. Skip if not available.

5. Safety, Ethics and Accessibility Notes

The only risks in this lab are operational:

docker run pulls images from public registries. Only run images you trust (the lab uses official images from Docker Hub).
docker system prune and docker volume rm permanently delete data. Read every destructive command before pressing Enter.
Bind mounts expose part of your host filesystem to the container. A misbehaving (or malicious) program inside the container can modify those files.

6. Scenario and Problem Statement

In Lab 3 you built a command-line application that translates natural-language commands into wheel motor speeds. It runs locally inside a venv with several Python dependencies (scikit-learn, sentence-transformers, torch). Distributing it to a colleague means asking them to install the right Python version and the right libraries on their own machine - a step that breaks more often than not in practice.

In this lab you will package the same application into a Docker image so that anyone with Docker installed can run it with a single command. You will then learn how to:

Develop inside a container using VSCode Dev Containers, so your editor uses the container’s Python and libraries (no host-side venv needed).
Compose multiple services together with docker-compose.
Pass GPU access through to a container - the standard pattern for AI workloads.

7. Lab Workflow

Phase	Student action	Expected output	Time
1. Setup	Verify Docker, clone the starter	`docker --version` works; starter cloned	5 min
2. Concepts	Read about how containers differ from VMs	Working mental model of containers	10 min
3. First container	Run `hello-world` and an interactive Ubuntu shell	Two containers run successfully	10 min
4. Build image	Write a `Dockerfile` for the text-to-cmd app	A built image runs the application	15 min
5. Volumes & bind mounts	Mount input / output directories into the container	Container reads and writes host files	15 min
6. docker-compose	Wire two services together with a compose file	`docker compose up` starts everything	10 min
7. GPU passthrough (optional)	Run a container with `--gpus all`	`nvidia-smi` works inside the container	5 min
8. VSCode Dev Containers	Create `.devcontainer/devcontainer.json` and reopen in container	VSCode runs inside the image	15 min
9. Bonus: run on the FOSSBot (optional)	Ship the image to the robot and run it there	The classifier produces JSON on the robot	5 min
10. Cleanup	Remove containers, images, starter directory	Clean `/tmp` and Docker state	3 min
11. Reflection	Answer the analysis questions	Short answers	2 min

8. Step-by-Step Instructions

Step 1 - Environment preparation

💡 Lab workstation credentials. Every workstation in the lab uses the same local account: username put, password lrm.

Log in to your lab workstation and open a terminal (Ctrl+Alt+T on Ubuntu).
Clean up state from any previous lab session. Remove leftover screenshots, any starter directory from a previous run, and any Docker artifacts that this lab will (re)create. This matches what Step 10 at the end of the lab tears down, so if the previous user ran their cleanup properly most parts will be no-ops:

docker compose -p fossbot-text-to-cmd down --volumes 2>/dev/null; \
docker rm -f myweb 2>/dev/null; \
docker image rm fossbot-text-to-cmd:latest fossbot-text-to-cmd:gpu 2>/dev/null; \
docker image rm $(docker images --filter "reference=vsc-fossbot-text-to-cmd*" -q) 2>/dev/null; \
rm -rf ~/Pictures/Screenshots /tmp/fossbot-text-to-cmd /tmp/host-output /tmp/host-input.txt

The ; chains the sub-commands so each one runs even if a previous one had nothing to remove, and 2>/dev/null silences the “no such container / no such image” messages on a fresh workstation.

Verify that Docker is installed and that you can use it without sudo:

docker --version
docker info

Both commands should print useful output without asking for a password. The first prints the Docker version; the second dumps a summary of the running daemon, including how many images and containers are currently on this workstation.

If Docker is not installed (reference only - the lab workstations come with it)

Follow the official Ubuntu install guide. The headline steps are:

# Remove any older versions
sudo apt remove docker docker-engine docker.io containerd runc

# Install prerequisites
sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release

# Add Docker's official GPG key and repository
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
    sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
    https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Allow your user to run docker without sudo
sudo usermod -aG docker $USER
# Log out and back in for the group change to take effect.

Clone the starter repository into /tmp:

cd /tmp
git clone https://github.com/LRMPUT/fossbot-text-to-cmd.git
cd fossbot-text-to-cmd

💡 Tip: This is the same repository used in Lab 3. If you completed Lab 3, the classifiers in src/ are exactly the application we will containerise. If you skipped Lab 3 or did not finish, copy the reference solutions on top of the skeleton:
cp _solutions/classifier_sklearn.py src/classifier_sklearn.py
cp _solutions/classifier_st.py     src/classifier_st.py

Expected result: docker --version prints a version string, docker info runs without errors, and your prompt is inside the cloned fossbot-text-to-cmd/ directory.

Step 2 - How Docker actually works

This step is a short conceptual read - no commands to run yet. The goal is to give you a working mental model of what a container is, what it is not, and why this matters in practice.

Key terms

Image: a packaged filesystem snapshot plus metadata (entrypoint, default command, environment variables). It is read-only and reusable.
Container: a running instance of an image. You can start, stop and remove containers; they leave the image untouched.
Layer: an image is built up from stacked filesystem layers. Each instruction in a Dockerfile produces one layer. Layers are cached and shared between images.
Registry: a server that stores images (e.g. Docker Hub at https://hub.docker.com).

Containers vs virtual machines

Both let you run “another system” on top of your host, but they are optimised for different things and have different trade-offs. Neither is universally better.

Aspect	Virtual Machine	Docker container
What is virtualised	The whole computer, including its own kernel	Just the userspace - applications and libraries
Guest OS	Any OS (Linux, Windows, BSD, …) regardless of host	Same family as host - on Linux you run Linux containers
Isolation strength	Strong - hypervisor enforces separate kernels and memory	Process-level - all containers share the host kernel
Resource overhead	Higher - each VM boots and runs its own OS	Lower - no kernel to boot, no driver stack to load
Configuration	OS installer + manual setup, or a pre-built image	A short text recipe (`Dockerfile`)
Persistence model	VM keeps its disk and state across reboots	Containers are short-lived by default; persistent data lives in volumes
Networking	Each VM gets its own virtual network adapter	Containers share the host kernel’s network stack, with namespaces for isolation

Where Docker has a clear advantage over virtual machines

Describe an environment as a short text recipe. A Dockerfile is a readable list of commands; the same recipe always builds the same image. With a VM you usually install an OS and click through configuration screens, then snapshot the result - much harder to keep in version control or to diff between two versions.
Share and reuse parts of an environment. Docker images are made of layers; if two images share a base layer it is stored once. You can pull a 200 MB image even if it conceptually contains “all of Ubuntu” because the Ubuntu layer is reused. VM images are monolithic disks - you copy the whole thing every time.
Start a fresh, isolated environment in milliseconds. A container is just a process tree with its own filesystem view; there is no kernel to boot. That makes it cheap to create a new container per test, per build, per pull request.
Run many lightweight instances side by side. Because containers share the host kernel and start fast, you can run tens or hundreds on a single workstation - one per microservice, one per worker, one per CI job. Trying the same with VMs would exhaust RAM.
Distribute applications as a single artefact. A Docker image bundles the app, the Python version, the libraries, the system packages and the configuration. Anyone with Docker can run it with one command - no pip install, no “works on my machine” problems.

In one sentence: a VM virtualises the machine; a container packages the application’s environment.

What is actually inside an Ubuntu image?

When you docker pull ubuntu:24.04 you get the Ubuntu userspace - the filesystem layout, bash, apt, glibc, all the standard utilities. You do not get the Linux kernel. Containers share the kernel of the host.

That is why containers start in milliseconds: there is no kernel to boot.

Concrete consequence: an Ubuntu container running on top of Ubuntu 24.04 sees the host’s kernel:

docker run --rm ubuntu:18.04 uname -a
# Linux ...something... 6.8.0-117-generic ... (the host's kernel, not 18.04's)

The bash and apt inside the container come from Ubuntu 18.04, but uname reports the host’s kernel version. We will run this command for real in Step 3.

Docker on Windows and macOS

Docker containers are a Linux feature - they rely on Linux kernel facilities (namespaces, cgroups). So how can Docker also run on Windows and macOS?

Windows: Docker Desktop uses WSL2 (Windows Subsystem for Linux 2), which is a thin Linux VM with its own kernel. All your containers run inside that hidden Linux environment.
macOS: Docker Desktop runs a tiny Linux VM (HyperKit or similar).
Linux: Docker runs natively, with no virtualisation layer in between.

This means an ubuntu:24.04 image runs the same on every host, but on Windows and macOS there is an extra virtualisation hop. You pay a small performance and disk-space cost on those systems.

What if my image and my host use different Ubuntu versions?

Suppose your host runs Ubuntu 24.04 (kernel 6.x) and you run an ubuntu:18.04 container.

The image gives you Ubuntu 18.04 userspace - the bash, apt and libraries you would have had on Ubuntu 18.04.
The kernel is still the host’s 6.x kernel.

This works because the Linux kernel exposes a stable, backward-compatible system call interface. Programs compiled for kernel 4.x normally still run on kernel 6.x. The rare exceptions are programs that depend on very old, removed system calls.

The reverse direction (a newer image on an older host kernel - for example ubuntu:24.04 on a host with kernel 4.x) sometimes works but is riskier. The rule of thumb: the host kernel should be at least as new as the kernel the image was built for.

In practice the common case - running an older or equal-age userspace on a modern host kernel - works freely. This is one of the most useful Docker features: you can run “Ubuntu 18.04” or “Debian 12” containers on any modern Linux host without installing a second OS.

Why we care for this course

Robotics projects pile up dependencies fast: a specific Python version, a specific OpenCV build, ROS 2, CUDA, PyTorch. Containers let you freeze those dependencies into an image, share it with collaborators or copy it onto the robot, and reproduce the same environment everywhere. In the rest of this lab you will do exactly that for the text-to-cmd application from Lab 3.

Expected result: You can answer in your own words: “what is the difference between a container and a virtual machine?”, “what is inside a Docker image?” and “why does an ubuntu:18.04 container run on a 24.04 host?”. No screenshots to take in this step.

Step 3 - Your first container

Time to use Docker. You will run two small containers, learn the basic lifecycle commands (run, ps, exec, stop, rm) and verify the claim from Step 2 that a container uses the host’s kernel.

Run the canonical “hello world” container. This is the simplest possible check that Docker works end to end:

docker run hello-world

The first time you run it, Docker reports that the image is not available locally and pulls it from Docker Hub. Then it starts a container that prints a short message and exits. The image (hello-world) is a few hundred bytes - the message is the entire application.

See what just happened. List the containers Docker remembers:

docker ps        # currently running containers - probably empty
docker ps -a     # all containers, including ones that have exited

docker ps -a should show one entry: the hello-world container with status Exited (0). Containers stick around after they finish so you can inspect logs or restart them. Remove the leftover with:

docker rm <CONTAINER_ID>

(use the first few characters of the ID - Docker accepts unique prefixes).

💡 Tip: Add --rm to docker run to auto-delete the container as soon as it exits, for one-off commands:
docker run --rm hello-world

Start an interactive Ubuntu shell. This pulls a real Ubuntu image (~80 MB) and drops you into a bash prompt inside it:

docker run -it --rm ubuntu:24.04 bash

-i keeps STDIN open
-t allocates a pseudo-TTY (so the shell behaves normally)
--rm auto-removes the container on exit
ubuntu:24.04 is the image (ubuntu is the name, 24.04 is the tag)
bash is the command to run inside the container

Your prompt should change to something like root@<container_id>:/#. You are now inside the container as root.

Look around inside the container. Try a few commands:

ls /
cat /etc/os-release    # confirms the userspace - "Ubuntu 24.04.x LTS"
dpkg -l | wc -l        # very small package count - this is a minimal Ubuntu
uname -a               # prints the HOST's kernel version, not the image's
exit                   # leaves the container; --rm deletes it

The uname -a result is the proof that containers share the host kernel: you are “inside Ubuntu 24.04” but the kernel version matches whatever your workstation runs.

Try an older Ubuntu to see the cross-version effect. Repeat the experiment with an older image:

docker run --rm ubuntu:18.04 bash -c "cat /etc/os-release | head -2 && uname -a"

The first two lines of /etc/os-release should say Ubuntu 18.04. uname -a still reports your host kernel. You just ran an Ubuntu 18.04 userspace on top of your modern kernel without installing a second OS.

Run something useful in the background. Start a small web server container in detached mode:

docker run -d --name myweb -p 8088:80 nginx:alpine

-d runs the container detached (returns immediately, container keeps running)
--name myweb gives the container a memorable name
-p 8088:80 maps port 80 inside the container to port 8088 on the host
nginx:alpine is a small (~7 MB) image with the nginx web server

Check that it is running, then verify the web server responds:

docker ps                       # should show myweb, status "Up ..."
curl http://localhost:8088      # nginx welcome page (HTML)

📸 Capture for submission: screenshot the terminal showing the docker ps output (including myweb) together with the curl http://localhost:8088 output, while the container is still running.

💡 Tip: If you see an error like address already in use, another program on your workstation is already listening on that host port. Pick a different port (e.g. -p 8089:80) and re-run. Don’t forget to docker rm myweb first if the previous attempt left a stopped container behind.

Enter the running container to look around without stopping it:

docker exec -it myweb sh
# inside the container:
ls /usr/share/nginx/html        # nginx default site files
exit

Stop and remove the container when you are done:

docker stop myweb
docker rm myweb
docker ps -a                    # confirm it is gone

Expected result: You have run three different containers (hello-world, two interactive ubuntu shells, an nginx web server), used the lifecycle commands run, ps, exec, stop and rm, and confirmed that the kernel inside an Ubuntu container is your host’s kernel.

Step 4 - Build a custom image

So far you have run images that other people published. In this step you will package the text-to-cmd application from Lab 3 into your own Docker image.

What is a Dockerfile?

A Dockerfile is a text recipe that tells docker build how to construct an image, one step at a time. Each instruction creates a new layer on top of the previous one. The same recipe always builds the same image, so the file can live in your repository alongside the code.

The instructions you will use here:

Instruction	What it does
`FROM <image>`	Start from an existing image (your base). Every Dockerfile starts with `FROM`.
`WORKDIR <path>`	Set the working directory inside the image. Equivalent to `cd` for later layers.
`COPY <src> <dst>`	Copy files from your host (the build context) into the image.
`RUN <command>`	Execute a command inside the image at build time. Its output is a new layer.
`CMD ["arg", ...]`	Default command that is run when someone does `docker run` without overriding it.

Write the Dockerfile

Make sure you are inside the starter directory, then create a file called Dockerfile at its top level:

cd /tmp/fossbot-text-to-cmd
nano Dockerfile

Type (or paste) the following content:

# Start from a small official Python image (Debian slim + Python 3.12)
FROM python:3.12-slim

# All subsequent paths are relative to /app inside the image
WORKDIR /app

# Install Python dependencies first, in two layers, so Docker can cache them
# Layer 1: CPU-only PyTorch (much smaller than the default CUDA build)
RUN pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cpu

# Layer 2: the rest of the requirements
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code and the dataset
COPY src/  src/
COPY data/ data/

# Default command if someone just runs `docker run <image>`
CMD ["python", "-m", "src.text_to_wheels", "--help"]

Save (Ctrl+O, Enter) and exit (Ctrl+X).

💡 Tip: Why install torch first and then requirements.txt? Because the heaviest layer (the PyTorch download) almost never changes, so Docker can re-use it from cache on every subsequent build. The requirements.txt layer is smaller and will rebuild only when you change that file. We will look at caching again at the end of this step.

Build the image

docker build -t fossbot-text-to-cmd:latest .

-t <name>:<tag> tags the image with a name (fossbot-text-to-cmd) and a version tag (latest).
The . at the end is the build context - the directory whose contents are sent to the Docker daemon. Files outside this directory cannot be COPYed.

The first build takes a few minutes - most of the time is spent downloading the CPU-only PyTorch wheel and the other Python packages. Docker prints one line per Dockerfile instruction; you should see seven => [step] lines.

Inspect what you built

docker images fossbot-text-to-cmd          # the image and its size
docker history fossbot-text-to-cmd:latest  # the layers, newest first (top) to oldest (bottom)

The image is around 1.5 GB - most of it is PyTorch and its dependencies. The docker history output shows the layers and the size each one added.

Run the image

Run the default CMD (which prints the CLI help):

docker run --rm fossbot-text-to-cmd:latest

Now override the default command and process the sample input file that lives inside the image:

docker run --rm fossbot-text-to-cmd:latest \
    python -m src.text_to_wheels \
    --input data/examples/basic.txt \
    --output /tmp/result.json \
    --classifier sklearn

The output JSON is written to /tmp/result.json inside the container. Because we did not mount any host directory, the file disappears with the container - that is what Step 5 is going to fix.

Layer caching - rebuild to see it work

Run the build again without changing anything:

docker build -t fossbot-text-to-cmd:latest .

This time it finishes in seconds. Docker recognised that every instruction had the same inputs as before and reused all cached layers. Now edit src/wheel_mapping.py (for example change 0.5 to 0.6 in the forward action), then rebuild:

docker build -t fossbot-text-to-cmd:latest .

Only the layers from COPY src/ src/ onwards rebuild - the heavy RUN pip install layers stay cached because the files they depend on (requirements.txt, the index URL) did not change. That is why the order of instructions in a Dockerfile matters: cheap-to-rebuild things go at the bottom, expensive things at the top.

Expected result: The terminal shows the running container printing CLI help, and a second run that finishes successfully and would have written /tmp/result.json inside the container. docker images lists fossbot-text-to-cmd with a tag of latest.

📸 Capture for submission: screenshot the terminal showing the last few lines of docker build (with the line naming to docker.io/library/fossbot-text-to-cmd:latest) followed by docker images fossbot-text-to-cmd.

Step 5 - Volumes and bind mounts

In the previous step you wrote a JSON result to /tmp/result.json inside the container - and it disappeared with the container the moment it exited. In real use you almost always want one of two things instead:

Bind mount: take a directory or a file on your host and make it visible inside the container at a chosen path. The container reads and writes the same files you can read and write on the host.
Volume: a directory managed by Docker, living somewhere under /var/lib/docker/. You give it a name, mount it into one or more containers, and Docker handles where the bytes actually live.

Roughly:

Feature	Bind mount	Volume
Where it lives	A path on your host that you choose	Managed by Docker, hidden under `/var/lib/docker/`
Created by	`-v <absolute_host_path>:<container_path>` - Docker sees a path on the left and binds it	`docker volume create <name>`, or implicitly by `-v <volume_name>:<container_path>` - Docker sees a bare name on the left and uses a managed volume
Best for	Sharing source code or data with the container, editing files on the host	Persistent state between container runs (databases, model caches)
Survives	As long as you do not delete the host directory	Until you `docker volume rm` it
Downsides	Tied to a host path, less portable	Less convenient to inspect from the host

In this step you will use both.

Bind mount: read and write host files from the container

Create an output directory on the host and remember its path:

mkdir -p /tmp/host-output

Run the application with a bind-mounted output directory so the JSON ends up on the host:

docker run --rm \
    -v /tmp/host-output:/output \
    fossbot-text-to-cmd:latest \
    python -m src.text_to_wheels \
        --input data/examples/basic.txt \
        --output /output/sklearn_basic.json \
        --classifier sklearn

-v <host_path>:<container_path> is the bind-mount flag. Both paths must be absolute.
/tmp/host-output on your workstation is mounted at /output inside the container.
The application writes /output/sklearn_basic.json from its point of view - which is /tmp/host-output/sklearn_basic.json on the host.

Confirm the file is on the host:

ls /tmp/host-output/
cat /tmp/host-output/sklearn_basic.json | head -10

Bind a single input file to override the dataset baked into the image. Create your own input on the host:

cat > /tmp/host-input.txt <<'EOF'
forward
turn left
halt
EOF

Then mount it into the container as the input file:

docker run --rm \
    -v /tmp/host-input.txt:/input.txt \
    -v /tmp/host-output:/output \
    fossbot-text-to-cmd:latest \
    python -m src.text_to_wheels \
        --input /input.txt \
        --output /output/custom_result.json \
        --classifier st

Check the result:

cat /tmp/host-output/custom_result.json

The container processed YOUR file even though it was never copied into the image. Bind mounts are how you give a containerised application its data without rebuilding.

Volume: state managed by Docker

A volume is useful when you want persistent state that is not tied to a specific host path - for example a cache that several containers should share, or model files you do not want to re-download on every container start.

Create a named volume:

docker volume create text-to-cmd-output
docker volume ls
docker volume inspect text-to-cmd-output

The inspect output shows the on-disk location (under /var/lib/docker/volumes/). You normally do not touch that path directly - you just refer to the volume by name.

Use the volume by mounting it the same way as a bind mount, but with the volume name on the left side of the colon:

docker run --rm \
    -v text-to-cmd-output:/output \
    fossbot-text-to-cmd:latest \
    python -m src.text_to_wheels \
        --input data/examples/basic.txt \
        --output /output/in_volume.json \
        --classifier sklearn

The result is in the volume, not on a host path you chose. Run a second throwaway container to read it back:

docker run --rm \
    -v text-to-cmd-output:/output \
    fossbot-text-to-cmd:latest \
    cat /output/in_volume.json

The same volume mounted into two different container runs gave you persistent state without leaving any visible trace in your home directory.

Remove the volume when you are done with it (the file inside disappears with it):

docker volume rm text-to-cmd-output
docker volume ls

Expected result: cat /tmp/host-output/sklearn_basic.json prints valid JSON, cat /tmp/host-output/custom_result.json shows the predictions for your own three-line input file, and docker volume ls no longer lists text-to-cmd-output after you removed it.

Step 6 - docker-compose

Up to now you have started containers one at a time with long docker run commands. Real applications usually consist of several services running together (a frontend + an API + a database, for example), and even single-service apps benefit from having their run configuration written down so you do not have to remember the right flags every time.

That is what Compose is for: a YAML file describes one or more services, and docker compose up starts them all with their volumes, environment variables and dependencies wired up correctly.

Write the compose file

Create a file called docker-compose.yml at the top of the starter directory (same place as the Dockerfile):

cd /tmp/fossbot-text-to-cmd
nano docker-compose.yml

Paste in the following:

services:
  basic-sklearn:
    image: fossbot-text-to-cmd:latest
    volumes:
      - ./compose-output:/output
    command: >
      python -m src.text_to_wheels
      --input data/examples/basic.txt
      --output /output/basic_sklearn.json
      --classifier sklearn

  basic-st:
    image: fossbot-text-to-cmd:latest
    volumes:
      - ./compose-output:/output
    command: >
      python -m src.text_to_wheels
      --input data/examples/basic.txt
      --output /output/basic_st.json
      --classifier st

What this says:

services: is the top-level key. Everything below it defines one container that Compose will manage.
Each service has a name (basic-sklearn, basic-st) and reuses the image you built in Step 4.
Both services bind-mount the same host directory ./compose-output at /output inside the container. The directory is created automatically if it does not exist yet.
command: overrides the image’s default CMD. The > makes YAML fold the next indented lines into one string, so the long invocation stays readable.

💡 Tip: Even though the YAML key is volumes:, the entry ./compose-output:/output is a bind mount - the left side starts with ./, which Compose treats as a host path. A bare name like mydata:/output would refer to a managed volume that must also be declared in a top-level volumes: section. Same rule as for docker run -v from Step 5.

Run everything with one command

docker compose up

Compose pulls or reuses the image, creates the two containers, starts them in parallel and streams their stdout to your terminal, each line prefixed with the service name. The containers run, write their JSON files, and exit. Compose returns control once both services are done.

Check that both result files landed on the host:

ls compose-output/
cat compose-output/basic_sklearn.json | head -10
cat compose-output/basic_st.json | head -10

Inspect what Compose did

The containers it just ran are now stopped but still listed:

docker compose ps        # services managed by this compose project
docker compose ps -a     # including the ones that exited

You can also re-run them without recreating from scratch:

docker compose up      # re-runs anything that has changed

Tear down

docker compose down

This stops and removes the containers and the default network Compose created for them. The image stays on disk; the ./compose-output directory and its files also stay (it is a host bind mount).

Switch the storage to a named volume and chain in reader services

Now redo the same exercise but with a managed volume instead of a host directory, and add two extra services that consume the JSON files that the first two services produce. This shows three things at once: how to declare a top-level volume, how depends_on orders services, and how containers share data through a volume without anything appearing on the host filesystem.

Replace the contents of docker-compose.yml with the skeleton below and complete the TODO sections yourself:

services:
  basic-sklearn:
    image: fossbot-text-to-cmd:latest
    volumes:
      - text-to-cmd-output:/output
    command: >
      python -m src.text_to_wheels
      --input data/examples/basic.txt
      --output /output/basic_sklearn.json
      --classifier sklearn

  basic-st:
    image: fossbot-text-to-cmd:latest
    volumes:
      - text-to-cmd-output:/output
    command: >
      python -m src.text_to_wheels
      --input data/examples/basic.txt
      --output /output/basic_st.json
      --classifier st

  reader-sklearn:
    image: fossbot-text-to-cmd:latest
    # TODO 1: mount the named volume at /output (same as the producers above).
    # Docs + example: https://docs.docker.com/reference/compose-file/services/#short-syntax-5
    volumes:
    # TODO 2: write the command that prints the sklearn JSON file to stdout.
    # Docs + example: https://docs.docker.com/reference/compose-file/services/#command
    command:
    depends_on:
      basic-sklearn:
        condition: service_completed_successfully

  reader-st:
    image: fossbot-text-to-cmd:latest
    # TODO 3: same volume mount as in reader-sklearn.
    volumes:
    # TODO 4: same idea as TODO 2 but for the ST result file.
    command:
    depends_on:
      basic-st:
        condition: service_completed_successfully

# TODO 5: declare the named volume that all four services mount above.
# Docs + example: https://docs.docker.com/reference/compose-file/volumes/
volumes:

Hint - reference solution

  reader-sklearn:
    image: fossbot-text-to-cmd:latest
    volumes:
      - text-to-cmd-output:/output
    command: cat /output/basic_sklearn.json
    depends_on:
      basic-sklearn:
        condition: service_completed_successfully

  reader-st:
    image: fossbot-text-to-cmd:latest
    volumes:
      - text-to-cmd-output:/output
    command: cat /output/basic_st.json
    depends_on:
      basic-st:
        condition: service_completed_successfully

volumes:
  text-to-cmd-output:

Run it:

docker compose up

Each reader’s output is streamed to your terminal prefixed with the service name, so you see the contents of both JSON files printed inline. There is nothing on the host: ls compose-output/ (if the directory still exists from the earlier run) does not get any new files, and docker volume ls lists the new text-to-cmd-output volume.

📸 Capture for submission: after docker compose up of the second compose file finishes, capture a screenshot of docker compose ps -a (showing all four services with Exited (0)) together with the prompt where you ran it. The JSON content does not need to be in the screenshot - the Exited (0) state of all four services is what proves the pipeline ran end-to-end.

Tear down everything, including the volume this time:

docker compose down --volumes
docker volume ls

The volume is gone, the containers are gone, the JSON files that lived in the volume are gone. The image and the host ./compose-output from the earlier exercise are untouched.

Expected result: After the first compose file ls compose-output/ shows basic_sklearn.json and basic_st.json with valid JSON content. After the second compose file the readers stream the JSON to your terminal during docker compose up, no new files appear in compose-output/, and docker volume ls lists text-to-cmd-output until you tear it down with --volumes.

Step 7 - GPU passthrough

By default a container cannot see the host GPU - the Docker process is isolated from /dev/nvidia* devices and from the userspace driver libraries. To make the GPU visible inside the container you need two things on the host:

An NVIDIA driver installed and working (nvidia-smi runs from the host shell).
The NVIDIA Container Toolkit installed and registered as a Docker runtime.

Tip: If you do not have an NVIDIA GPU on your machine (AMD/Intel only, or a non-Linux host without GPU passthrough configured), read through this step but skip the commands - the rest of the lab does not depend on a working GPU.

Verify the host has everything in place:
```
nvidia-smi
docker info | grep -i "runtime"
```
The first command must print a table with your GPU, driver version, and CUDA version. The second command must list nvidia among the available runtimes (alongside the default runc). If nvidia-smi works but docker info does not list nvidia, you are missing the NVIDIA Container Toolkit - install it from the official guide and rerun.
Run a CUDA base image without the --gpus flag and try to call nvidia-smi from inside:
```
docker run --rm nvidia/cuda:12.5.0-base-ubuntu22.04 nvidia-smi
```
The nvidia-smi binary is present in the image, but the command fails because the host’s NVIDIA driver libraries (libnvidia-ml.so.1) and /dev/nvidia* device nodes are not visible inside the container - the container is isolated from the host’s GPU stack until something injects them.
Add the --gpus all flag and rerun the same image:
```
docker run --rm --gpus all nvidia/cuda:12.5.0-base-ubuntu22.04 nvidia-smi
```
Now nvidia-smi runs inside the container and prints the same table you saw on the host - GPU model, driver version, CUDA version, memory, and the (empty) process list of the container. The container sees the GPU because the NVIDIA Container Toolkit injected the driver libraries and device nodes at startup.

Compose has its own syntax for the same thing. The equivalent of --gpus all in a docker-compose.yml service is:

services:
  gpu-job:
    image: nvidia/cuda:12.5.0-base-ubuntu22.04
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

You can replace count: all with count: 1 to expose only the first GPU, or use device_ids: ["0", "2"] to pick specific GPUs by index. No need to test this snippet now - the syntax is just for your reference.

The fossbot-text-to-cmd image is CPU-only - rebuild a GPU variant. Our Dockerfile installs PyTorch from --index-url https://download.pytorch.org/whl/cpu (Step 4), which is the CPU-only build. Even with --gpus all, sentence-transformers would still run on CPU because the installed PyTorch does not have CUDA support compiled in. GPU passthrough is two-sided: the host must expose the GPU (toolkit + --gpus all), and the image must be built with a GPU-capable framework. Build a GPU variant of the image and verify it actually uses CUDA:
1. Open the Dockerfile and change only the PyTorch install line to use a CUDA wheel index. Pick a CUDA version supported by your driver from https://pytorch.org/get-started/locally/ - cu121 is a safe default for recent drivers. Example:
```
RUN pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cu121
```
1. Build the GPU variant under a separate tag so the CPU image you already have stays usable:
```
docker build -t fossbot-text-to-cmd:gpu .
```
This download is significantly larger than the CPU build (~2 GB) and the build will take a few minutes.
1. Before running the app, verify PyTorch inside the new image sees the GPU:
```
docker run --rm --gpus all fossbot-text-to-cmd:gpu \
  python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('Device:', torch.cuda.get_device_name(0))"
```
You should see CUDA available: True and your GPU model.
1. Now run the actual sentence-transformer classifier on GPU. Go back to Step 5 task 2 (the bind-mounted docker run that produced /tmp/host-output/sklearn_basic.json) and modify that command so it:
- uses the GPU image you just built (different tag than :latest),
- exposes the GPU to the container (the flag you learned in task 3),
- runs the st classifier instead of sklearn (sklearn does not use PyTorch, so GPU would not help),
- writes the output to a new filename so it does not overwrite the CPU result.
Write the modified command yourself and run it. The output JSON should match the structure of the CPU run from Step 5 (same actions, same wheel speeds for the same inputs). To prove the GPU was actually used, open a second terminal before launching the run and start a continuous monitor:
```
nvidia-smi -l 1    # refresh every 1 s; Ctrl+C to stop
# or: watch -n 0.5 nvidia-smi
```
Then trigger the docker run in the first terminal. While the container is running you should see a python process appear in the Processes: section of nvidia-smi, with a few hundred MB of GPU memory used. The process disappears as soon as the container exits.
Hint - reference solution
```
docker run --rm --gpus all \
    -v /tmp/host-output:/output \
    fossbot-text-to-cmd:gpu \
    python -m src.text_to_wheels \
        --input data/examples/basic.txt \
        --output /output/st_gpu.json \
        --classifier st
```

Expected result: nvidia-smi runs successfully inside the nvidia/cuda:... container when --gpus all is passed, and fails or is missing when it is not. The output of the in-container nvidia-smi matches the host’s nvidia-smi for driver/CUDA version and GPU model. The fossbot-text-to-cmd:gpu image prints CUDA available: True (task 5c) and the GPU run from task 5d produces an output JSON identical in structure to the CPU run, with a python process visible in nvidia-smi on the host while the container is running.

Step 8 - VSCode Dev Containers

Up to now every container has been a runtime sandbox - you build an image, the container runs the application once, exits, and you never edit code from inside it. A dev container turns the picture inside out: VSCode itself runs as a thin client on the host, but the workspace, the Python interpreter, the debugger, and every command you type in the integrated terminal live inside the container. Editing a .py file feels the same as editing it on the host, except the runtime that executes it is the one from your Dockerfile. This guarantees that your dev environment matches whatever runs in production - no “works on my machine”.

You drive everything from a single configuration file: .devcontainer/devcontainer.json. VSCode reads it, builds (or reuses) the image, mounts your project folder into the container as the workspace, attaches an editor server inside, and finally drops you into a VSCode window that looks identical to a normal one - only the bottom-left status bar shows Dev Container: ... to remind you where you are.

Install the Dev Containers extension in VSCode (publisher: Microsoft, extension ID ms-vscode-remote.remote-containers). Either click the Extensions icon and search for “Dev Containers”, or run from a host terminal:
```
code --install-extension ms-vscode-remote.remote-containers
```
💡 Tip: If you already have Microsoft’s Remote Development extension pack installed (ms-vscode-remote.vscode-remote-extensionpack), you do not need to install Dev Containers separately - the pack bundles it along with Remote-SSH, Remote-Tunnels and WSL. The pack is heavier but useful if you also work over SSH or inside WSL.

Create .devcontainer/devcontainer.json at the root of fossbot-text-to-cmd with the contents below. The three things this config does:

builds the dev container image from the existing Dockerfile you wrote in Step 4 (no second Dockerfile),
installs the Microsoft Python extension automatically the first time the container is created,
gives the container a recognisable name shown in the VSCode status bar.

{
    // Human-readable name shown in the VSCode status bar when the
    // container is open. Anything would work; we match the project folder.
    "name": "fossbot-text-to-cmd",

    // Build the dev container from the project's Dockerfile.
    // "../Dockerfile" - this devcontainer.json sits in .devcontainer/, so ".."
    //   points one level up to the project root where the Dockerfile lives.
    // ".." - build context = project root, so the Dockerfile's COPY steps see
    //   the whole project (src/, data/, requirements.txt, ...).
    "build": {
        "dockerfile": "../Dockerfile",
        "context": ".."
    },

    // Extensions installed inside the container on first open. The string is
    // the extension ID in the form "publisher.name" (visible on the
    // Marketplace page or in the Extensions panel), not the display name.
    // ms-python.python = official Microsoft Python extension (syntax,
    // debugger, linting). Add more IDs if you want them.
    "customizations": {
        "vscode": {
            "extensions": [
                "ms-python.python"
            ]
        }
    }
}

Reopen the folder in the container. Make sure VSCode is open on fossbot-text-to-cmd. Then open the Command Palette (Ctrl+Shift+P on Linux/Windows, Cmd+Shift+P on macOS) and run Dev Containers: Reopen in Container. VSCode reuses the cached image layers from your previous docker build so the first start should take seconds rather than minutes. A progress notification in the bottom-right shows what is happening; you can click show log to watch the actual build steps.

When it finishes, the bottom-left of the window shows Dev Container: fossbot-text-to-cmd.
Open the integrated terminal (Ctrl+`). The shell prompt is now coming from inside the container - you are no longer on the host. Verify:
```
python --version
pwd
ls
python -m src.text_to_wheels --help
```
You should see Python 3.12 (the one from python:3.12-slim, not the system Python from your host), a working directory matching what you set as workspaceFolder (default /workspaces/fossbot-text-to-cmd), the project files, and the --help output of the CLI.
Edit a file from the VSCode editor and see the change from the in-container terminal. Open src/text_to_wheels.py in VSCode, add a print("hello from devcontainer") line at the top of the file, save it (Ctrl+S), then in the container terminal run:
```
python -m src.text_to_wheels --help | head -3
```
The print appears. The host folder is bind-mounted into the container by VSCode, so edits propagate immediately in both directions.
What survives a rebuild? Your workspace files live on the host (bind-mounted into the container at /workspaces/fossbot-text-to-cmd), but anything you install inside the container with pip or apt lives in the container’s writable layer and is erased when the container is rebuilt. Verify both halves:
1. In the container terminal, install a package that is not in requirements.txt:
```
pip install requests
python -c "import requests; print(requests.__version__)"
```
  It works.
2. Rebuild the container: Command Palette → Dev Containers: Rebuild Container. Wait for VSCode to reload.
3. In the new container terminal, retry the import:
```
python -c "import requests"     # ModuleNotFoundError - the pip install is gone
```
  Your workspace files (src/, data/, requirements.txt…) are untouched - they live on your host, the rebuild only replaces the container’s filesystem.
The pip install only affected the container’s writable layer and disappeared with the rebuild. To make a package permanent, add it to requirements.txt and rebuild - the RUN pip install -r requirements.txt step in your Dockerfile will pick it up.

Expected result: Dev Container: fossbot-text-to-cmd is visible in the bottom-left of the VSCode window. The integrated terminal runs Python from the container image, sees the project files, and reflects edits made in the VSCode editor instantly. After a rebuild, in-container pip installs are gone but workspace edits remain.

📸 Capture for submission: screenshot of the VSCode window showing (1) Dev Container: fossbot-text-to-cmd in the bottom-left status bar, (2) the integrated terminal with the output of python --version and python -m src.text_to_wheels --help, (3) src/text_to_wheels.py open in an editor tab.

💡 Tip: devcontainer.json has a second mode - instead of build.dockerfile you can use dockerComposeFile + service to point at a compose file and name one of its services as your dev environment. The other services in the same compose file come up alongside.

Step 9 - Optional bonus: run the container on the FOSSBot

If your FOSSBot:v2 has Docker installed, you can ship the image you built on the workstation to the robot and run the classifier on the robot itself - no rebuild, just transfer.

The lab FOSSBot:v2 platform is a Raspberry Pi 5 (8 GB RAM, ARM64) running Ubuntu Server 24.04 LTS. SSH into it as:

hostname: fossbotrpi1.local
user: admin
password: !F055b0t

The fossbot-text-to-cmd:latest image you built in Step 4 is the CPU variant, which is exactly what you want here: small, fast to transfer, and self-contained.

Architecture caveat: docker save | docker load copies the bytes as-is; it does not cross-compile. Your workstation is x86_64 and the lab Pi is ARM64 - if you ship the workstation build straight to the robot, the image will load but fail to start with exec format error. Rebuild the image for ARM64 before shipping:
If you completed Step 7 task 5a, revert the change in /tmp/fossbot-text-to-cmd/Dockerfile. Find the line:
RUN pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cu121
and change cu121 back to cpu:
RUN pip install --no-cache-dir torch --index-url https://download.pytorch.org/whl/cpu
The CUDA wheel only exists for x86_64, and the FOSSBot has no NVIDIA GPU anyway.
Register cross-platform emulators on the workstation (one-time, uses QEMU under the hood):
docker run --privileged --rm tonistiigi/binfmt --install all
Build for ARM64 and reuse the same :latest tag:
docker buildx build --platform linux/arm64 -t fossbot-text-to-cmd:latest --load .
The build runs through QEMU emulation and is significantly slower than a native build - expect 10-30 minutes for the torch and sentence-transformers wheels.

Ship the image to the robot as a single pipe - docker save writes a tarball to stdout, docker load on the other end reads it from stdin:
```
docker save fossbot-text-to-cmd:latest | ssh admin@fossbotrpi1.local docker load
```

Run the classifier on the robot, writing the JSON to a host path on the robot:

ssh admin@fossbotrpi1.local "mkdir -p /tmp/out && docker run --rm \
    -v /tmp/out:/output \
    fossbot-text-to-cmd:latest \
    python -m src.text_to_wheels \
    --input data/examples/basic.txt \
    --output /output/basic_sklearn.json \
    --classifier sklearn"

--rm makes the container delete itself once the run finishes. The JSON stays on the robot under /tmp/out/basic_sklearn.json.

Drive the wheels. Feed the JSON into your robot driver from Lab 2 - each entry’s wheels: {left, right} field maps directly to the speeds the driver expects. The mapping itself is in src/wheel_mapping.py so you can read out the speed for any predicted action.
Remove the image from the robot when you no longer need it (the workstation copy stays untouched):
```
ssh admin@fossbotrpi1.local docker rmi fossbot-text-to-cmd:latest
```

The full lifecycle - build once on the workstation, ship as a tarball, run on the robot, clean up - is the same pattern you would use to deliver a containerised application to any machine without a registry.

Step 10 - Cleanup

Leave the dev container (if Step 8 is still open): in VSCode open the Command Palette and run Dev Containers: Reopen Folder Locally. The window reloads as a normal VSCode window on the host.
Tear down everything you created in this lab in one chained command. From any directory:
```
docker compose -p fossbot-text-to-cmd down --volumes 2>/dev/null; \
docker rm -f myweb 2>/dev/null; \
docker image rm fossbot-text-to-cmd:latest fossbot-text-to-cmd:gpu 2>/dev/null; \
docker image rm $(docker images --filter "reference=vsc-fossbot-text-to-cmd*" -q) 2>/dev/null; \
rm -rf /tmp/fossbot-text-to-cmd /tmp/host-output /tmp/host-input.txt
```
The five sub-commands tear down (in order): the Compose stack and its named volume from Step 6, the standalone myweb container from Step 3, the two project images you built (:latest and the optional :gpu from Step 7), the VSCode-built dev container image (tagged vsc-fossbot-text-to-cmd-...), and the host directories.

Expected result: docker images | grep fossbot-text-to-cmd returns nothing, docker ps -a | grep myweb returns nothing, and /tmp/fossbot-text-to-cmd no longer exists. The shared base images (python:3.12-slim, nginx:alpine, …) are still on disk for the next student.

9. Analysis Questions

Bind mounts vs named volumes. What is the single key difference between a bind mount and a named volume that decides which one to use? Give one realistic situation for each.

After attempting it yourself, you may review the suggested answer

The key difference is portability. A bind mount is tied to the host filesystem - your compose file or docker run -v only works on machines where that exact host path exists. A named volume is referred to by name; Docker creates it locally on whichever machine runs the workload, so the same compose file works the same on dev, staging and production. Bind mount fits cases where the host path is the point - a dev workspace you edit live, an output folder you cat from the host. Named volume fits cases where the workload should run unchanged on any machine - a database’s storage, a shared model cache between containers.

Layer caching. Look at the order of instructions in the Dockerfile from Step 4: FROM, WORKDIR, RUN pip install torch, COPY requirements.txt, RUN pip install -r requirements.txt, COPY src/, COPY data/. If you moved COPY src/ above RUN pip install -r requirements.txt, how would the rebuild behaviour change the next time you only edited one .py file inside src/? Why?

After attempting it yourself, you may review the suggested answer

Editing a .py file would invalidate the cache for COPY src/, and Docker invalidates every layer after a changed one - so the expensive pip install -r requirements.txt would re-run on every rebuild. Rule: put stable + expensive layers (dependencies) before frequently-changing + cheap layers (source code).

Containers vs virtual machines. Step 2 listed concrete advantages of containers over VMs, but containers are not always the better choice. Name one task where a full VM is genuinely the better option despite slower boot time and larger size, and explain why containers are not enough there.

After attempting it yourself, you may review the suggested answer

A VM is needed whenever you require a different OS kernel from the host, kernel-level changes, or stronger isolation. Examples: running Windows on a Linux host, experimenting with kernel modules, or running untrusted code where the hypervisor boundary matters. Containers share the host kernel and cannot help with any of those.

GPU passthrough is two-sided. For a container to actually use the host GPU, two things must be set up. Name both sides and say who controls each - the image author or the person running the container.

After attempting it yourself, you may review the suggested answer

Host side - NVIDIA driver + NVIDIA Container Toolkit + --gpus all on docker run. Controlled by the person running the container.
Image side - the framework inside must be a GPU-capable build (e.g. PyTorch installed from the cu121 wheel index, not the cpu index). Controlled by the image author through the Dockerfile.

10. Submission Requirements

A screenshot of docker ps showing the myweb container alongside the curl http://localhost:8088 output from Step 3.
A screenshot of the final docker build lines (with the naming to docker.io/library/fossbot-text-to-cmd:latest line) and docker images fossbot-text-to-cmd from Step 4.
A screenshot of docker compose ps -a with all four services in Exited (0) from Step 6.
A screenshot of the VSCode dev container window from Step 8 showing the Dev Container: ... status bar, the integrated terminal, and an open editor tab on src/text_to_wheels.py.
Short answers to the four analysis questions.

11. References and Open Licence

Docker official documentation - https://docs.docker.com/
VSCode Dev Containers documentation - https://code.visualstudio.com/docs/devcontainers/containers
Compose file specification - https://docs.docker.com/compose/compose-file/

The Creative Commons Attribution 4.0 International (CC BY 4.0) license allows users to share, copy, distribute, and adapt the work, even for commercial purposes, as long as proper credit is given to the original creator.

EU funding disclaimer

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.