Visualization and Dimension Reduction
Outline of the course
We consider the generic problem of reducing the dimension of a dataset, while preserving as much information as possible. We consider model of the type:
\[ x = f(z) + \varepsilon, \]
where \(x\) is the observed high-dimensional data, \(z\) is a low-dimensional latent variable, \(f\) is a function linking the two variables and \(\varepsilon\) is a noise term. Different choices of \(f\) and of the distribution of \(\varepsilon\) lead to different models for dimension reduction.
We will also study the so-called decoding problem, which consists in estimating \(x\) from \(z\):
\[ \hat x = g(z). \]
Other approaches to dimension reduction, such as multidimensional scaling (MDS) or t-SNE, will not be studied in this course.
The course is organized as follows:
- Linear models:
- Factor Analysis (FA),
- Probabilistic Principal Component Analysis (PPCA),
- Independent Component Analysis (ICA).
- Non-linear models:
- Auto-Encoders (AE),
- Variational Auto-Encoders (VAE).
- Multidimensional scaling (MDS) and t-SNE, UMAP (if time permits).