Member-only story

The Theory and Math Behind Principal Component Analysis (PCA)

Helene
5 min readJan 6, 2025

--

In a former article, we looked at what dimensionality means in Machine Learning, as well as The Curse of Dimensionality and what problems it causes. We have already seen how we can use feature selection, specifically subset selection, for dimensionality reduction. In this article, we will start to investigate feature extraction. Or to be more precise, we will try to understand the theory and math behind Principal Component Analysis (PCA), which is a feature extraction method. A certain amount of understanding of Linear Algebra is a requisite for the reader. This article will be the basis for our understanding when we will try to implement the method with Python, both with and without libraries, in the following article.

What is the idea behind PCA?

To start out softly, we will first try to understand what the general idea is behind PCA. We remember that in the former article, we talked about subset selection — here we simply choose a new subset, k with k < d, of our original feature set. In other words, our individual, chosen features remained unchanged. This is not the case with PCA. PCA assumes that the information is carried in the variance of the features. In other words, the higher variation in a feature, the more information that feature has. But how exactly does PCA find the amount of variance in each feature?

PCA identifies a set of orthogonal axes, also called the principal components, that capture the maximum variance in the…

--

--

No responses yet