Visualization and Dimension Reduction

Published

January 20, 2025

Outline of the course

We consider the generic problem of reducing the dimension of a dataset, while preserving as much information as possible. We consider model of the type:

\[ x = f(z) + \varepsilon, \]

where \(x\) is the observed high-dimensional data, \(z\) is a low-dimensional latent variable, \(f\) is a function linking the two variables and \(\varepsilon\) is a noise term. Different choices of \(f\) and of the distribution of \(\varepsilon\) lead to different models for dimension reduction.

We will also study the so-called decoding problem, which consists in estimating \(x\) from \(z\):

\[ \hat x = g(z). \]

Other approaches to dimension reduction, such as multidimensional scaling (MDS) or t-SNE, will not be studied in this course.

The course is organized as follows:

  1. Linear models:
  • Factor Analysis (FA),
  • Probabilistic Principal Component Analysis (PPCA),
  • Independent Component Analysis (ICA).
  1. Non-linear models:
  • Auto-Encoders (AE),
  • Variational Auto-Encoders (VAE).
  1. Multidimensional scaling (MDS) and t-SNE, UMAP (if time permits).

Liens