Natural Language Processing
From classical word representations to large language models, with an emphasis on mathematical foundations and hands-on implementation.
The course covers the full arc of NLP, from classical word representation methods to modern large language models, with an emphasis on mathematical foundations and hands-on implementation.
The lecture slides and notebooks are available on GitHub, and the course is structured as follows:
| Topic | Key Concepts | |
|---|---|---|
| 1 | Word Embeddings | Distributional hypothesis, co-occurrence matrices, TF-IDF, PMI, LSA, LDA, Word2Vec, GloVe |
| 2 | Embedding exercises | Practical problems on vector spaces and similarity measures |
| 3 | LSA & Word2Vec notebook | SVD-based dimensionality reduction, skip-gram training, analogy tasks |
| 4 | Transformers | Self-attention, positional encoding, encoder/decoder architectures, BERT, GPT, fine-tuning (SFT, RLHF) |
| 5 | Transformer exercises | Conceptual questions on attention, masking, and training objectives |
| 6 | GPT from scratch notebook | Character-level GPT implementation in PyTorch following Karpathy’s nanoGPT |
| 7 | Transformers in Practice | Downstream tasks: text classification, QA, seq2seq, generation; fine-tuning regimes |
| 8 | BERT exercises | Fine-tuning BERT for sentiment analysis; feature extraction vs. full fine-tuning |
| 9 | BERT sentiment notebook | Fine-tune BERT on SST-2; frozen vs. fine-tuned embeddings, error analysis, attention visualisation |