Lab 03 - Introduction to Python - external libraries

Introduction to Python - external libraries

IRiM and Fossbot4AI logos

1. Activity Identity

Activity title	Introduction to Robotics
Topic	Python / AI / NLP
Authors	Institute of Robotics and Machine Intelligence Dominik Belter, Jakub Chudziński, Marcin Czajka, Kamil Młodzikowski
Target learners	Bachelor (Computer Science / IT, Robotics)
Estimated duration	1.5 hour
Difficulty level	Beginner
FOSSBot environment	Hybrid
Licence	CC BY 4.0

2. Learning Objectives and Competences

ID	Learning outcome	Related competences	Assessment evidence
LO1	Students will be able to install and isolate external Python libraries using `venv` and `pip`.	Knowledge of Python tooling; selecting programming tools	Screenshot of the working virtual environment after `pip install`
LO2	Students will be able to implement a small classical text-classification pipeline with `scikit-learn` (TF-IDF + LogisticRegression).	Selecting programming tools; using libraries for designing perception components	Working `classifier_sklearn.py` and its JSON output on the basic test file
LO3	Students will be able to use a pretrained multilingual model from `sentence-transformers` to match text by meaning.	Using libraries for designing perception components; selecting programming tools	Working `classifier_st.py` and its JSON output on the multilingual test file

3. Prerequisites

A workstation running Linux with a working network connection.
Basic computer literacy: comfortable using a keyboard and mouse, opening applications, capturing screenshots.
Basic Python knowledge: variables, functions, lists, dictionaries, if/else.

4. Required Material and Setup

Category	Item	Version / Quantity	Notes
Hardware	Workstation	1 per student	Any Linux PC.
Software	Python 3.10+, `pip`, `venv`	bundled with most Linux distributions	Pre-installed on the lab workstations.
Software	`git`	bundled with most Linux distributions	Used to clone the starter repository.
Dataset / model	`paraphrase-multilingual-MiniLM-L12-v2`	downloaded on first run	Around 120 MB, cached after the first use. Hosted on Hugging Face.
Starter code	`fossbot-text-to-cmd`	from GitHub	Provides the CLI skeleton, dataset and `TODO` blocks you will fill in.

5. Safety, Ethics and Accessibility Notes

The only risks in this lab are operational:

pip install runs arbitrary code from PyPI. Always inspect requirements.txt before installing and only install from the file provided with the starter.
The virtual environment isolates dependencies from your system Python. Do not run pip install outside the activated venv unless you know what you are doing.

6. Scenario and Problem Statement

In this lab you will build a small command-line tool that takes a natural-language command (for example "go forward") and outputs the corresponding wheel motor speeds as JSON - the kind of format a low-level robot driver would consume.

You will implement two text-classifiers and compare them:

A classical machine-learning pipeline built from scratch with scikit-learn (TF-IDF + LogisticRegression).
A pretrained multilingual sentence-transformer used through similarity matching.

7. Lab Workflow

Phase	Student action	Expected output	Time
1. Setup	Create venv, install dependencies	Working environment	10 min
2. Classical ML	Implement TF-IDF + LogisticRegression in `classifier_sklearn.py`	sklearn classifier passes basic test	25 min
3. Pretrained AI	Implement similarity matcher in `classifier_st.py`	sentence-transformer classifier works	20 min
4. Experiments	Add the two multilingual runs and inspect the diff	4 JSON outputs	10 min
5. Understand	Read the conceptual explanation	Understand the role of features vs classifier	10 min
6. Bonus (optional)	Drive a physical robot with your classifier	Robot moves on text commands	-
7. Cleanup	Deactivate venv, remove starter directory	Clean `/tmp` for the next user	2 min
8. Reflection	Answer the analysis questions	Short answers	13 min

8. Step-by-Step Instructions

Step 1 - Environment preparation

💡 Lab workstation credentials. Every workstation in the lab uses the same local account: username put, password lrm.

Log in to your lab workstation.
Open a terminal (Ctrl+Alt+T on Ubuntu).
Clean up state from any previous lab session. Remove leftover screenshots and any starter directory from a previous run, so your final submission only contains artifacts from this session and git clone does not fail with destination path already exists:

rm -rf ~/Pictures/Screenshots /tmp/fossbot-text-to-cmd

Clone the starter repository into /tmp:

git clone https://github.com/LRMPUT/fossbot-text-to-cmd.git /tmp/fossbot-text-to-cmd

💡 Tip: All lab work lives in /tmp/fossbot-text-to-cmd. /tmp is the conventional location for scratch work and is wiped on every reboot, so the workstation stays clean for the next user.

Create an isolated Python environment with venv and activate it:

python3 -m venv /tmp/fossbot-text-to-cmd/.venv
source /tmp/fossbot-text-to-cmd/.venv/bin/activate

After activation your prompt should change to start with (.venv). From now on, every python and pip command runs inside this environment, not the system one.

Install a CPU-only PyTorch first. By default pip would download the CUDA build of PyTorch (~5 GB of NVIDIA libraries). We do not need GPU support in this lab, so pull the much smaller CPU build from PyTorch’s dedicated index:

pip install torch --index-url https://download.pytorch.org/whl/cpu

Install the remaining dependencies declared in requirements.txt:

pip install -r /tmp/fossbot-text-to-cmd/requirements.txt

Together the two commands download roughly 900 MB of packages. The first install takes a few minutes.

Verify the install by importing each library:

python -c "import pandas, sklearn, sentence_transformers; print('OK')"

Expected result: The terminal prints OK. If anything fails to import, re-read the pip install output for an error and re-run the install.

📸 Capture for submission: screenshot the terminal showing the successful OK line and a prompt that starts with (.venv).

Step 2 - Classical ML classifier

You will now fill in the classifier skeleton from the starter repository you cloned in Step 1. Open /tmp/fossbot-text-to-cmd/src/classifier_sklearn.py in any editor and complete the TODOs.

The class builds a tiny but complete classical-ML text classifier from two scikit-learn building blocks:

TF-IDF (Term Frequency * Inverse Document Frequency) turns each text into a vector of numbers. Every dimension corresponds to one word (or word pair) that appears in the training data, weighted by how often it occurs in that text and how rare it is overall. Here a token is one word or word-pair, and a command is one input text (a row of training_commands.csv).
- TF (term frequency) - how many times a token appears in a command.
- IDF (inverse document frequency) - how rare a token is across all 74 commands: IDF = log(N / df) (N = 74, df = how many commands contain the token). go is in 9 commands, so its weight is low; wait is in 1 command, so its weight is high.
- Each vector value is TF * IDF: TF is how often the token appears in the command, IDF scales it by how rare the token is overall. So distinctive words like wait get a large value and common ones like go a small one.
LogisticRegression classifies a command’s TF-IDF vector. From the training data it learns a weight from each token to each action (e.g. left weighs toward turn_left). For each action it then adds up TF-IDF * weight across the tokens; the action with the highest total is the prediction.

Example - go left, with TF-IDF go = 0.4, left = 0.9. Each action’s score = 0.4 * weight(go) + 0.9 * weight(left):
- turn_left (weights go = 0.1, left = 2.0): 0.4 * 0.1 + 0.9 * 2.0 = 1.84
- forward (weights go = 0.3, left = 0.0): 0.4 * 0.3 + 0.9 * 0.0 = 0.12
- backward (weights go = 0.3, left = 0.0): 0.12
- turn_right (weights go = 0.1, left = 0.0): 0.04
- stop (weights go = 0.0, left = 0.0): 0.00
turn_left has the highest score, so it is the prediction.

Work through the TODOs in order:

TODO 1 - build the pipeline. In __init__, create a sklearn Pipeline with two named steps - a TfidfVectorizer first, then a LogisticRegression:

self.pipeline = Pipeline([
    ("vec", TfidfVectorizer(lowercase=True, ngram_range=(1, 2))),
    ("clf", LogisticRegression(max_iter=1000)),
])

TODO 2 - load the CSV. Use pandas.read_csv to read the file pointed to by csv_path. The CSV has two columns: text and action.
TODO 3 - train / test split. Use train_test_split with test_size=0.2 and random_state=42 so the result is reproducible:

x_train, x_test, y_train, y_test = train_test_split(
    df["text"], df["action"], test_size=0.2, random_state=42
)

TODO 4 - fit the pipeline on the training split.
TODO 5 - print accuracy. Predict on the test set, compute the accuracy with accuracy_score, and print the result:

predictions = self.pipeline.predict(x_test)
print(f"Test accuracy: {accuracy_score(y_test, predictions):.3f}")

TODO 6 and 7 - predict on a single text. This one is a small synthesis challenge - the API of Pipeline.predict and Pipeline.predict_proba has a small twist that you need to work around. Implement predict(self, text) so it returns a (action, confidence) tuple where:
- action is the predicted class label as a single string.
- confidence is the maximum class probability (a float between 0 and 1).

Hint - things to figure out from the docs

Both predict() and predict_proba() expect a list of texts, not a single string - so wrap the input as [text].
They both return a list/array even for a single input - take element [0] to extract the single result.
predict_proba() returns a 2D array of shape (n_samples, n_classes). Use .max() on the right row to get the highest probability.

When the file is complete, run the classifier on the basic input file:

cd /tmp/fossbot-text-to-cmd && python -m src.text_to_wheels \
    --input data/examples/basic.txt \
    --output outputs/sklearn_basic.json \
    --classifier sklearn

Expected result: The terminal prints something like Test accuracy: 0.80 (your exact number may vary because of the random split) and Processed 7 commands with the 'sklearn' classifier. Open /tmp/fossbot-text-to-cmd/outputs/sklearn_basic.json and verify that each command was mapped to the right action (forward, turn_left, stop, …).

📸 Capture for submission: screenshot the terminal showing the test accuracy, the success message and a one-line-per-prediction summary of the result (so the screenshot fits on screen):
python -c "import json; [print(f\"{r['input']:22s} -> {r['action']:12s} ({r['confidence']:.3f})\") for r in json.load(open('/tmp/fossbot-text-to-cmd/outputs/sklearn_basic.json'))]"

Experiment with the classical pipeline

Once the classifier passes the basic test, try the following variations one at a time. Re-run after each change and note how the test accuracy changes.

Change ngram_range from (1, 2) to (1, 1) in your TfidfVectorizer. This drops bigrams - the model only sees individual words now. Does accuracy go up, down or stay the same? Think about why that might be for our short English commands.

Expected outcome

For our small dataset the accuracy on the held-out test set actually goes up slightly with (1, 1) - around 0.93 versus 0.80 with (1, 2) or (1, 3). The English commands are short and the unigrams are already very informative on their own; adding bigrams introduces extra TF-IDF features that overfit the small training set and hurt generalisation on the test set. The takeaway: more features is not always better - on small datasets they can add noise rather than signal.

Change random_state=42 to another value (for example 7 or 0). Run multilingual.txt again with the sklearn classifier. Does the same class still win the bias collapse, or did another class take over? What does that tell you about the role of random_state?

Expected outcome

backward only wins the collapse for random_state=42. With random_state=0, 1, 3, 7 or 100 the dominant prediction switches to stop (15 of 15 multilingual inputs are predicted as stop). The bias-winner is an artefact of how the random shuffle happened to split the classes; it is not a property of the data, the classifier or the Polish/German/Spanish inputs. The mechanism (all-zero TF-IDF vector -> only biases matter) is identical for every seed.

Append a few Polish (or any language you choose) phrases for each action to /tmp/fossbot-text-to-cmd/data/training_commands.csv (e.g. do przodu,forward). Re-run on multilingual.txt. Did the sklearn classifier suddenly become multilingual? Why or why not?

Expected outcome

With three Polish phrases per action added, all five Polish inputs in multilingual.txt (do przodu, skręć w lewo, stop, do tyłu, skręć w prawo) are now classified correctly. The German and Spanish inputs are almost all still wrong, because their words are still missing from the TF-IDF vocabulary - they continue to produce all-zero vectors and collapse to whichever class wins the bias (now stop after the new training distribution). The lesson is concrete: TF-IDF cannot generalise beyond the languages and exact words in its training data. Full multilingual support would require training examples for every language you care about.

Revert the changes before moving on so the rest of the lab runs against the original setup.

Step 3 - Pretrained multilingual classifier

You will now fill in the second classifier skeleton. Open /tmp/fossbot-text-to-cmd/src/classifier_st.py in any editor and complete the TODOs.

This classifier needs no training data of its own. It uses a pretrained multilingual model from the sentence-transformers library that already understands the meaning of sentences in around 50 languages.

The idea:

A pretrained model maps any sentence to a fixed-length vector (an embedding). Sentences with similar meaning end up close to each other in the embedding space, regardless of language.
For each action we keep a small list of reference phrases. These are defined at the top of classifier_st.py as the TEMPLATES dictionary - three English phrases per action. You do not need to modify it:

TEMPLATES = {
    "forward":    ["go forward", "forward", "ahead"],
    "backward":   ["go back", "backwards", "reverse"],
    "turn_left":  ["turn left", "left", "rotate left"],
    "turn_right": ["turn right", "right", "rotate right"],
    "stop":       ["stop", "halt", "stay still"],
}

We pre-encode all 15 template phrases once in __init__, store the resulting vectors and reuse them on every call. Encoding the templates fresh on every predict() would be wasteful.
To classify a new input, we encode it and pick the action whose templates are closest to it, measured by cosine similarity.

Work through the TODOs in order:

TODO 1 - load the model. In __init__, instantiate the pretrained SentenceTransformer model:

self.model = SentenceTransformer(model_name)

The first call downloads the model (around 120 MB) and caches it under ~/.cache/huggingface/.

TODO 2 - pre-encode the templates. For each action in TEMPLATES, encode its template phrases once and store the resulting matrix in self.template_embeddings:

for action, phrases in TEMPLATES.items():
    self.template_embeddings[action] = self.model.encode(phrases)

TODO 3, 4, 5 - predict. The main coding challenge. Implement predict(self, text) so it returns a (action, similarity) tuple where action is the action whose templates are most similar to the input and similarity is the cosine similarity of that best match.

Hint - composition pattern

Encode the input with self.model.encode([text]) (wrap in a list - the API expects a list and returns a 2D array).
Loop over self.template_embeddings.items(). For each action, compute cosine_similarity(user_vec, embeddings) and take .max() as that action’s score.
Track the best action and best score across the loop, and return them as a tuple at the end.

When the file is complete, run the classifier on the same basic input:

cd /tmp/fossbot-text-to-cmd && python -m src.text_to_wheels \
    --input data/examples/basic.txt \
    --output outputs/st_basic.json \
    --classifier st

Expected result: The terminal prints Processed 7 commands with the 'st' classifier. and /tmp/fossbot-text-to-cmd/outputs/st_basic.json looks similar to the sklearn output - every command was mapped to the right action.

📸 Capture for submission: screenshot the terminal showing the success message and the compact summary of the result:
python -c "import json; [print(f\"{r['input']:22s} -> {r['action']:12s} ({r['confidence']:.3f})\") for r in json.load(open('/tmp/fossbot-text-to-cmd/outputs/st_basic.json'))]"

Experiment with the sentence-transformer

Try the following one at a time and observe what changes:

Reduce the templates. Keep only one phrase per action in TEMPLATES (delete the other two). Re-run on multilingual.txt. Does accuracy hold up, or do some inputs now miss? Why might that be?

Expected outcome

Accuracy holds up - on the standard multilingual.txt you still get 15 out of 15 correct even with a single template per action. The pretrained model already understands meaning, so a single anchor phrase per action is enough; the multilingual capability comes from the model itself, not from the number of templates. Extra phrases mainly help on tricky or ambiguous inputs that are equally close to several actions - they widen the “catchment area” without changing the language coverage.

Test your own languages. Append 2-3 phrases in another language you know (Italian, French, Russian, …) to /tmp/fossbot-text-to-cmd/data/examples/multilingual.txt. Re-run and check the predictions. Did the model handle the new languages?

Expected outcome

The model typically handles them well - paraphrase-multilingual-MiniLM-L12-v2 was pretrained on around 50 languages, so Italian (“avanti”, “indietro”), French (“avance”, “arrête”), Russian (“вперёд”, “стой”), Czech (“dopředu”) and Portuguese (“para frente”) all map to the right action without any extra work. The occasional miss happens on rarer words, but the coverage is impressive for a free 22 MB model that you did not have to “tell” about any of those languages.

Swap the model. Change model_name to "all-MiniLM-L6-v2" (an English-only sibling of our multilingual model). Re-run on both basic.txt and multilingual.txt. What happens? What does this tell you about where the multilingual capability lives?

Expected outcome

On basic.txt (English inputs) the English-only model still works perfectly - it knows English and the templates are English. But multilingual.txt collapses to roughly 4 out of 15: a couple of Polish words happen to be close enough to English ones for the model to recover, but German and Spanish drop almost entirely. The lesson: the multilingual behaviour lives in the pretrained model, not in our templates or in the classifier logic on top. The model AND the inputs need to share a common embedding space - when you swap to a model that only learned English, only English inputs continue to work.

Revert the changes before moving on.

Step 4 - Compare the two approaches

You have two working classifiers and two example files. Now run all four combinations and look at the differences:

cd /tmp/fossbot-text-to-cmd

# Sklearn on basic.txt
python -m src.text_to_wheels \
    --input data/examples/basic.txt \
    --output outputs/sklearn_basic.json \
    --classifier sklearn

# Sklearn on multilingual
python -m src.text_to_wheels \
    --input data/examples/multilingual.txt \
    --output outputs/sklearn_multilingual.json \
    --classifier sklearn

# Sentence transformer on basic.txt
python -m src.text_to_wheels \
    --input data/examples/basic.txt \
    --output outputs/st_basic.json \
    --classifier st

# Sentence transformer on multilingual
python -m src.text_to_wheels \
    --input data/examples/multilingual.txt \
    --output outputs/st_multilingual.json \
    --classifier st

Compare the four output files. The expected pattern is:

Input	Classifier	Expected accuracy
`basic.txt`	sklearn	high (~7/7)
`basic.txt`	st	high (~7/7)
`multilingual.txt`	sklearn	low (~2/15)
`multilingual.txt`	st	high (~14-15/15)

Look at /tmp/fossbot-text-to-cmd/outputs/sklearn_multilingual.json - the sklearn classifier collapses to almost always predicting the same action with very low confidence. Compare with /tmp/fossbot-text-to-cmd/outputs/st_multilingual.json, where Polish, German and Spanish commands all map to the correct action.

📸 Capture for submission: screenshot the compact summaries of both multilingual outputs printed back to back. Run the one-liner once per file:

echo "=== sklearn ===" && python -c "import json; [print(f\"{r['input']:22s} -> {r['action']:12s} ({r['confidence']:.3f})\") for r in json.load(open('/tmp/fossbot-text-to-cmd/outputs/sklearn_multilingual.json'))]" && echo "=== st ===" && python -c "import json; [print(f\"{r['input']:22s} -> {r['action']:12s} ({r['confidence']:.3f})\") for r in json.load(open('/tmp/fossbot-text-to-cmd/outputs/st_multilingual.json'))]"

Step 5 - Why these approaches differ on multilingual input

Every ML classifier operates on numbers. Text must first be turned into a vector. The whole difference between our two approaches is how that conversion happens, not which algorithm consumes the result afterwards.

How sklearn TF-IDF represents text

Step 1 - build a vocabulary. TfidfVectorizer reads all 75 examples from training_commands.csv and builds a dictionary of unique tokens. The vocabulary might look like:

vocab = ["go", "forward", "ahead", "turn", "left", "right", "back", "stop", ...]
                                  (say a few hundred unique tokens)

Step 2 - one vector per text. Each text becomes a vector with one dimension per vocabulary token (so a few hundred dimensions for our vocabulary). For "go forward":

[1, 1, 0, 0, 0, 0, 0, 0, ...]
 ^  ^
 |  └── "forward" is present
 └───── "go" is present
        all other positions are zero

(In practice the values are not 1 or 0 but TF * IDF weights; the principle is the same.)

Step 3 - the classifier learns. LogisticRegression learns rules like “when the positions for go and forward are set, the answer is forward”.

What happens with "fahre vorwärts"?

TF-IDF looks up "fahre" in the vocabulary    -> NOT THERE (never seen)
TF-IDF looks up "vorwärts" in the vocabulary -> NOT THERE (never seen)

Vector: [0, 0, 0, 0, 0, 0, ..., 0]   (all zeros)

The classifier receives an all-zero vector. There is no signal to act on, so it always predicts the same class. Which one? LogisticRegression has learned a per-class bias - a small built-in tendency that says, in effect, “when nothing else is informative, this is my default guess for this class”. The class with the highest bias wins. Biases are not arbitrary; they were adjusted during training and roughly reflect how often each action appeared in the training split. With our exact data and random_state=42, the backward class ended up with the highest bias, so it wins every tie - that is the backward-with-confidence-around-0.25 collapse you saw in Step 4. The exact confidence value may differ by a few thousandths between scikit-learn versions, but the winning class is reproducible. With a different random seed a different class would win (in fact stop wins for most other seeds), but the mechanism is the same.

Key idea: TF-IDF only knows the words it saw during training. This is lexical matching (letter matching), not semantic.

How sentence-transformers represents text

A sentence-transformer is a model that has already been trained, in our case paraphrase-multilingual-MiniLM-L12-v2. It was trained on billions of sentences in around 50 languages. During that pretraining it learned a crucial property: sentences with similar meaning get similar vectors - regardless of the language they are written in.

Step 1 - no vocabulary from our data. The model simply uses what it learned during pretraining.

Step 2 - each text becomes a fixed-length dense vector (a vector where every dimension carries a meaningful value, in contrast to TF-IDF where most are zero) that represents its meaning rather than the literal words (tokens) that make it up. The length of that vector is a property of the model: our paraphrase-multilingual-MiniLM-L12-v2 always produces 384 numbers. For illustration:

"go forward"     -> [ 0.20, -0.41,  0.56, ..., -0.11]   (384 numbers)
"do przodu"      -> [ 0.21, -0.43,  0.55, ..., -0.12]   very close
"fahre vorwärts" -> [ 0.22, -0.42,  0.54, ..., -0.13]   also close
"adelante"       -> [ 0.19, -0.40,  0.57, ..., -0.10]   also close
"stop"           -> [-0.15,  0.71, -0.22, ...,  0.34]   FAR from the four above
"halt"           -> [-0.16,  0.70, -0.23, ...,  0.33]   close to "stop"

# Note: a single number (e.g. 0.21) does not represent a word or a single
# concept. The 384 dimensions together form an abstract coordinate;
# meaning is encoded by the position of the whole vector in this space -
# sentences with similar meaning end up close to each other.

In this 384-dimensional embedding space, sentences cluster by meaning - English, Polish, German and Spanish forward-commands all land in the same neighbourhood, while "stop" sits elsewhere.

Step 3 - cosine similarity measures the angle between vectors. Close vectors -> similarity close to 1; far vectors -> close to 0.

What happens with "fahre vorwärts"?

1. The model encodes "fahre vorwärts" -> [0.22, -0.42, 0.54, ...]
   (the model understands the meaning - it IS a forward command, in German)

2. Compare to the template embeddings of every action:
   - similarity with "go forward"    = 0.93   <-- VERY HIGH
   - similarity with "go back"       = 0.21
   - similarity with "turn left"     = 0.18
   - similarity with "turn right"    = 0.17
   - similarity with "stop"          = 0.12

3. Highest similarity wins -> action = "forward"

This is semantic matching. The model never saw our specific templates during pretraining, but it learned on billions of sentence pairs that German "fahre vorwärts" and English "go forward" mean the same thing.

Side-by-side comparison

Aspect	TF-IDF (sklearn)	Sentence Transformer
Text representation	Sparse vector indexed by training-vocabulary tokens	Fixed-length dense vector representing meaning
Where does word knowledge come from?	Only from our 75-row training CSV	From pretraining on billions of multilingual sentences
Unseen word	Ignored (contributes 0)	Mapped to a meaningful position in the embedding space - the model breaks unknown words into sub-word units it has seen, so even brand-new words can be located near other texts with similar meaning
Languages	Only the language of the training data	Multilingual out of the box
What is “knowledge”?	Word statistics from our 75 examples	Meaning patterns learned from billions of sentences in many languages

The takeaway

The choice of classifier on top (LogisticRegression vs cosine similarity) is not what gives multilingual capability. The representation of the text - the features the classifier sees - is what makes the difference.

TF-IDF features only know the words from our training data. Pretrained sentence embeddings carry knowledge from a massive multilingual corpus. The same lesson applies across modern ML: it is often the features (and how you obtain them, for example by pretraining), not the algorithm, that decides whether a system works.

Step 6 - Optional bonus: drive a physical robot

If you have access to a robot, wire your classifier to a low-level wheel-control API to see the full text -> wheels -> motion pipeline live. A minimal sketch:

Read a command from input() or a file.
Call your predict() to obtain an action label.
Look up the wheel speeds in WHEEL_COMMANDS (in /tmp/fossbot-text-to-cmd/src/wheel_mapping.py).
Pass those speeds to your robot driver (for example the one you built in Lab 2) and watch the robot move.

Step 7 - Cleanup

When you have collected all the submission artefacts, leave the workstation in a clean state for the next user. From any directory:

deactivate 2>/dev/null; cd ~ && rm -rf /tmp/fossbot-text-to-cmd

deactivate exits the virtual environment (the leading 2>/dev/null silences the message if the venv was already inactive), cd ~ steps out of the starter directory so it can be removed, and rm -rf deletes the starter together with the .venv and all its installed packages.

The Hugging Face model cache under ~/.cache/huggingface/ can be left in place - it speeds up the next session and does not contain any session-specific state.

Expected result: ls /tmp/fossbot-text-to-cmd reports No such file or directory.

9. Analysis Questions

Look at the confidences in /tmp/fossbot-text-to-cmd/outputs/sklearn_multilingual.json. Almost every input ended up with the same prediction and the same confidence value (around 0.25). Explain why this happens.
Look at the templates dictionary in /tmp/fossbot-text-to-cmd/src/classifier_st.py. There are only 3 English phrases per action - no Polish, German or Spanish. Why is the sentence-transformer classifier still able to handle multilingual input correctly?
The wheels field in the JSON output uses values in [-1.0, 1.0]. The mapping from action name to wheel speeds is defined in /tmp/fossbot-text-to-cmd/src/wheel_mapping.py. What would you change there to make the robot turn faster on the spot?
The sklearn pipeline is trained from scratch on 75 examples; the sentence-transformer model is loaded already trained. List one advantage and one disadvantage of each approach for a project that needs to recognise 50 different commands instead of 5.

After attempting it yourself, you may review the suggested answer

sklearn

Advantage: deterministic, lightweight and fast - sub-millisecond inference, model file under a megabyte, easy to ship and reproduce.
Disadvantage: the labelled dataset has to grow roughly linearly with the number of commands - 50 commands need around 15-30 examples per class, i.e. 750-1500 labelled phrases to collect and maintain. The model is also brittle on paraphrases or any other words it never saw during training, and it works in a single language only.

Sentence transformer

Advantage: multilingual out of the box and robust to paraphrases - you only need 2-3 reference templates per command (about 150 phrases for 50 commands instead of 1500), no labelled dataset to collect.
Disadvantage: heavier runtime (dependencies and model weights on the order of hundreds of MB, ~100ms inference per query), less interpretable when it makes mistakes, and improving accuracy on specialised commands typically requires fine-tuning the model - which needs a GPU, thousands of paired sentences and significant compute time.

10. Submission Requirements

A screenshot of the working virtual environment from Step 1 (terminal shows the OK line and the (.venv) prompt).
A screenshot of the sklearn test accuracy and /tmp/fossbot-text-to-cmd/outputs/sklearn_basic.json from Step 2.
A screenshot of /tmp/fossbot-text-to-cmd/outputs/st_basic.json from Step 3.
A screenshot or diff comparing /tmp/fossbot-text-to-cmd/outputs/sklearn_multilingual.json and /tmp/fossbot-text-to-cmd/outputs/st_multilingual.json from Step 4.
Short answers to the four analysis questions.

11. References and Open Licence

scikit-learn documentation - https://scikit-learn.org/stable/
pandas documentation - https://pandas.pydata.org/docs/
sentence-transformers documentation - https://www.sbert.net/
Hugging Face model card for paraphrase-multilingual-MiniLM-L12-v2 - https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Direct links to the specific functions used in this lab are in the TODO comments of the skeleton files (src/classifier_sklearn.py, src/classifier_st.py).

The Creative Commons Attribution 4.0 International (CC BY 4.0) license allows users to share, copy, distribute, and adapt the work, even for commercial purposes, as long as proper credit is given to the original creator.

EU funding disclaimer

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.