Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Principal components analysis using pandas dataframe

How can I calculate Principal Components Analysis from data in a pandas dataframe?

like image 786
user3362813 Avatar asked Apr 25 '14 00:04

user3362813


People also ask

How do you do a PCA on a DataFrame in Python?

To do that one would do something like: pandas. DataFrame(pca. transform(df), columns=['PCA%i' % i for i in range(n_components)], index=df. index), where I've set n_components=5.

How does PCA work in Python?

According to Wikipedia, PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.


1 Answers

Most sklearn objects work with pandas dataframes just fine, would something like this work for you?

import pandas as pd import numpy as np from sklearn.decomposition import PCA  df = pd.DataFrame(data=np.random.normal(0, 1, (20, 10)))  pca = PCA(n_components=5) pca.fit(df) 

You can access the components themselves with

pca.components_  
like image 86
Akavall Avatar answered Sep 22 '22 21:09

Akavall