08 - Object tracking
Robotics I
Poznan University of Technology, Institute of Robotics and Machine Intelligence
Laboratory 8 - Object tracking
Goals
By the end of this lab you will:
- Understand how basic object tracking is performed
- Wire up a Kalman-filter to trace the object motion
- Forecast each object’s position into the future
Resources
Note: All code must run inside our provided Docker container.
Preparation
We are going to use the docker image used for classes 06 and 07.
Today’s class will not use ROS 2 to fully focus on the tracking aspects but the same logic can be implemented inside a ROS 2 node that was prepared during last week’s class.
To start we need to download the video that we are going to work today with:
Start/Attach to the docker container
Download the video
mkdir object_tracking && cd object_tracking && wget "https://chmura.put.poznan.pl/s/vsm7FpGL5r9weT3/download" -O video.mp4
Stage 1 - Run object tracking
- It is needed to install the missing
lap
library for tracking in the Ultralitics framework:
pip3 install lap
Note: If the command fails (especially in Ubuntu 24.04), add the
--break-system-packages
flag to the command above.
Objective: Seeing the object tracks
- We start with the modified Ultralytics object tracking exmaple that
should be placed at the path
/robotics1/object_tracking
:
tracking_example.py
# Python
from collections import defaultdict
import cv2
import numpy as np
from ultralytics import YOLO
# Load the YOLO11 model
= YOLO("/robotics1/src/robotics_object_detection/weights/yolo11n.pt")
model
# Open the video file
= "video.mp4"
video_path = cv2.VideoCapture(video_path)
cap
# Store the track history
= defaultdict(lambda: [])
track_history
# Loop through the video frames
while cap.isOpened():
# Read a frame from the video
= cap.read()
success, frame
if success:
# Run YOLO11 tracking on the frame, persisting tracks between frames
= model.track(frame, persist=True)[0]
result
# Get the boxes and track IDs
if result.boxes and result.boxes.id is not None:
= result.boxes.xywh.cpu()
boxes = result.boxes.id.int().cpu().tolist()
track_ids else:
pass
# Visualize the result on the frame
= result.plot()
annotated_frame
# Plot the tracks
for box, track_id in zip(boxes, track_ids):
= box
x, y, w, h = track_history[track_id]
track float(x), float(y))) # x, y center point
track.append((if len(track) > 30: # retain 30 tracks for 30 frames
0)
track.pop(
# Draw the tracking lines
= np.hstack(track).astype(np.int32).reshape((-1, 1, 2))
points =False, color=(230, 230, 230), thickness=10)
cv2.polylines(annotated_frame, [points], isClosed
# Display the annotated frame
"YOLO11 Tracking", annotated_frame)
cv2.imshow(
# Break the loop if 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord("q"):
break
else:
# Break the loop if the end of the video is reached
break
# Release the video capture object and close the display window
cap.release() cv2.destroyAllWindows()
- Run the script:
python3 tracking_example.py
- You should see the video with white track following each detected object indicating where this object was previously seen. If you have issues then please check that:
- video file (
video.mp4
) and tracking script (tracking_exameple.py
) where placed at the correct path (/robotics1/object_tracking
) - you have a yolo11n.pt model weight downloaded at the path
(
/robotics1/src/robotics_object_detection/weights/yolo11n.pt
), which should have been completed during last classes - you run
xhost +local:root
on the host machine if you just started a previously stopped container after system reboot.
Stage 2 - Understanding the tracking pipeline
When multiple detections appear in each frame, the core challenge is data association: deciding which detection in the new frame corresponds to previous set of image detections forming tracks for each object. A common pipeline combines:
- Motion Prediction (e.g., via a Kalman Filter)
- Similarity Computation (e.g.,
intersection‐over‐union between predicted and observed boxes)
- Assignment (e.g., Hungarian algorithm)
A. Motion Prediction: Kalman Filter
- State: For each track, maintain a state
vector
\[ \mathbf{x} = [x, y, v_x, v_y]^\top \]
where \((x, y)\) is the object’s center and \((v_x, v_y)\) its velocity in the image frame. Sometimes, we also store the bounding box size to also track the size of the detected object. - Prediction: Project the state forward by one time
step using the simple linear motion model
\[ \mathbf{x}_{k|k-1} = F\,\mathbf{x}_{k-1|k-1} \]
- Update: When a detection is associated to that track (we see the same object in the new image), correct the predicted state: \[ \mathbf{x}_{k|k} = \mathbf{x}_{k|k-1} + K \bigl(z_k - H\,\mathbf{x}_{k|k-1}\bigr) \]
In popular SORT (Simple Online and Realtime Tracking) approaches, a constant‐velocity Kalman Filter predicts each track’s bounding box before matching.
B. Similarity Measure: Intersection over Union (IoU)
- For each predicted box (A) and detected box (B), compute
\[ \mathrm{IoU}(A,B) = \frac{\mathrm{area}(A \cap B)}{\mathrm{area}(A \cup B)}. \]
- IoU values range from 0 (no overlap) to 1 (perfect overlap).
- Build a cost matrix (C) where
\[ C_{i,j} = 1 - \mathrm{IoU}\bigl(\text{pred}_i,\,\text{det}_j\bigr). \]
C. Optimal Assignment: Hungarian Algorithm
- The Hungarian algorithm (Kuhn–Munkres) finds the minimum‐cost
matching between two sets.
- Input: cost matrix (C).
- Output: assignment of track indices to detection indices that minimizes total cost (i.e., maximizes total IoU).
- As a result, we get the best matching between the previous tracks and the currently detected objects
Complete SORT Tracking Algorithm Workflow
- Predict each existing track’s bounding box with the
Kalman Filter.
- Compute the IoU‐based cost matrix between predicted
and detected boxes.
- Assign tracks ↔︎ detections via the Hungarian
algorithm.
- Update matched tracks (Kalman filter update
step).
- Initialize new tracks for unmatched
detections.
- Delete tracks that haven’t been matched for a preset number of frames (“lost” tracks).
Provided YOLO tracking example performs all of the above steps, which resulted in white tracks drawn on the shown images.
Stage 3 - Future position prediction
Our objective today is to predict the future positions of the detected objects in the image based on current object positions and their history of motion. The goal is to draw our predictions as arrows starting from the center of the current detection pointing to the predicted position in 10 frames.
Here is a template to fill:predict.py
# Python
#!/usr/bin/env python3
import cv2
import numpy as np
from ultralytics import YOLO
# Function to create a simple 4-state (x, y, dx, dy) Kalman filter
def create_kf():
= cv2.KalmanFilter(4, 2)
kf # State: [x, y, dx, dy]
# Measurement: [x, y]
= np.array([
kf.transitionMatrix 1, 0, 1, 0],
[0, 1, 0, 1],
[0, 0, 1, 0],
[0, 0, 0, 1]
[=np.float32)
], dtype= np.array([
kf.measurementMatrix 1, 0, 0, 0],
[0, 1, 0, 0]
[=np.float32)
], dtype# Tunable noise covariances
= np.eye(4, dtype=np.float32) * 1e-2
kf.processNoiseCov = np.eye(2, dtype=np.float32) * 1e-1
kf.measurementNoiseCov = np.eye(4, dtype=np.float32)
kf.errorCovPost return kf
def main():
# Load the YOLO11 model
= YOLO("/robotics1/src/robotics_object_detection/weights/yolo11n.pt")
model
# Open the video file
= cv2.VideoCapture("video.mp4")
cap
# Dictionary to hold a KalmanFilter per track ID
= {}
filters
while cap.isOpened():
= cap.read()
success, frame if not success:
break
# Run YOLO tracking to get stable track IDs
= model.track(frame, persist=True)[0]
result = result.plot() # draws bounding boxes + IDs
annotated
# Extract center-based boxes (x, y, w, h) and track IDs
if result.boxes and result.boxes.id is not None:
= result.boxes.xywh.cpu().numpy() # shape (N,4)
boxes = result.boxes.id.int().cpu().tolist()
ids else:
= [], []
boxes, ids
# TODO: Predict the next state of all objects we are currently tracking
# Update or initialize Kalman filter for each detection
for (x, y, w, h), tid in zip(boxes, ids):
# TODO: For each object we will have a separate Kalman filter. At this step either:
# - create a new filter (there is create_kf() at the top)
# - select a filter belonging to the currently analyzed object
# TODO: Correct the filter state based on the measurement
# TODO: Do the prediction of the state into the future: 10 steps, it has to be independent of on-going prediction/correction
# Draw an arrow from current center to predicted center
cv2.arrowedLine(
annotated,# TODO: current position,
# TODO: future position,
=(0, 255, 0),
color=2,
thickness=0.3
tipLength
)
# Display the frame
"YOLO + 10-frame Kalman Prediction", annotated)
cv2.imshow(if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Cleanup
cap.release()
cv2.destroyAllWindows()
if __name__ == "__main__":
main()
The expected result should look similar to:
An
example prediciton of a future object positions. Source: Own
materials
💥 💥 💥 Assignment 💥 💥 💥
To pass the course, you need to upload the following file to the eKursy platform:
- a screenshot presenting the predicted positions of the objects in 10 frames into the future as arrows on the provided image