Visualization and Dimension Reduction

Published

January 20, 2025

Outline of the course

We consider the generic problem of reducing the dimension of a dataset, while preserving as much information as possible. We consider model of the type:

\[ x = f(z) + \varepsilon, \]

where \(x\) is the observed high-dimensional data, \(z\) is a low-dimensional latent variable, \(f\) is a function linking the two variables and \(\varepsilon\) is a noise term. Different choices of \(f\) and of the distribution of \(\varepsilon\) lead to different models for dimension reduction.

We will also study the so-called decoding problem, which consists in estimating \(x\) from \(z\):

\[ \hat x = g(z). \]

Other approaches to dimension reduction, such as multidimensional scaling (MDS) or t-SNE, will not be studied in this course.

The course is organized as follows:

Linear models:

Factor Analysis (FA),
Probabilistic Principal Component Analysis (PPCA),
Independent Component Analysis (ICA).

Non-linear models:

Auto-Encoders (AE),
Variational Auto-Encoders (VAE).

Multidimensional scaling (MDS) and t-SNE, UMAP (if time permits).

Outline of the course

Liens