Skip to main content

PCA

PCA stands for principal component analysis. It's a way to convert vectors with a lot of dimensions to vectors with less dimensions without losing too much information.

It's useful when:

  • You are struggling with performance.
  • You want to save memory
  • Your algorithm struggles with high dimensional data
  • You want to remove noise from your data
  • You want to find the most important parts of your data
  • You want to make a 2d or 3d plot of your data.

The algorithm takes a target dimension nn as a number (that is smaller that the current dimension mm) and tries to find an orthogonal Matrix nΓ—mn \times m such that after multiplying each vector by the matrix, the points stay "as distinct as possible".

You can think of it as projecting your data onto the best mm-dimensional plane that keeps the data points distinct. Projecting from 3d to 2d is what your eyes do all the time to see space and this idea of projection works for all dimensions.