How can I calculate Principal Components Analysis from data in a pandas dataframe?
To do that one would do something like: pandas. DataFrame(pca. transform(df), columns=['PCA%i' % i for i in range(n_components)], index=df. index), where I've set n_components=5.
According to Wikipedia, PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.
Most sklearn objects work with pandas
dataframes just fine, would something like this work for you?
import pandas as pd import numpy as np from sklearn.decomposition import PCA df = pd.DataFrame(data=np.random.normal(0, 1, (20, 10))) pca = PCA(n_components=5) pca.fit(df)
You can access the components themselves with
pca.components_
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With