Natural Language Processing

From classical word representations to large language models, with an emphasis on mathematical foundations and hands-on implementation.

The course covers the full arc of NLP, from classical word representation methods to modern large language models, with an emphasis on mathematical foundations and hands-on implementation.

The lecture slides and notebooks are available on GitHub, and the course is structured as follows:

Topic Key Concepts
1 Word Embeddings Distributional hypothesis, co-occurrence matrices, TF-IDF, PMI, LSA, LDA, Word2Vec, GloVe
2 Embedding exercises Practical problems on vector spaces and similarity measures
3 LSA & Word2Vec notebook SVD-based dimensionality reduction, skip-gram training, analogy tasks
4 Transformers Self-attention, positional encoding, encoder/decoder architectures, BERT, GPT, fine-tuning (SFT, RLHF)
5 Transformer exercises Conceptual questions on attention, masking, and training objectives
6 GPT from scratch notebook Character-level GPT implementation in PyTorch following Karpathy’s nanoGPT
7 Transformers in Practice Downstream tasks: text classification, QA, seq2seq, generation; fine-tuning regimes
8 BERT exercises Fine-tuning BERT for sentiment analysis; feature extraction vs. full fine-tuning
9 BERT sentiment notebook Fine-tune BERT on SST-2; frozen vs. fine-tuned embeddings, error analysis, attention visualisation