Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the top three features of every principal component using pandas?

Tags:

python

pandas

pca

I am following the solution given here.

But the solution takes argmax() features from each Principal Component. I want to take the top three. How do I go about it?

I basically want to know which features have maximum impact on each of the PCs, respectively.

Thank you.

like image 698
Kedaar Rao Avatar asked Sep 18 '25 02:09

Kedaar Rao


1 Answers

You could get the sorted index by using np.argsort or np.argpartition. Following the procedure of the question indicated

# With argsort 
most_important = [np.argsort(np.abs(model.components_[i]))[::-1][:3] for i in range(n_pcs)]

# With argpartition
most_important = [np.argpartition(np.abs(model.components_[i]), -3)[-3:] for i in range(n_pcs)]

most_important
>>> [array([4, 1, 0]), array([2, 3, 4])]

then to get the most important components as columns

initial_feature_names = ['a','b','c','d','e']

# Notices the [::-1] is used to order the component names
most_important_names = [[initial_feature_names[i] for i in most_important[i][::-1]] for i in range(n_pcs)]
dic = {'PC{}'.format(i): most_important_names[i] for i in range(n_pcs)}
pd.DataFrame.from_dict(dic).T
>>>
    0   1   2
PC0 e   b   a
PC1 c   d   e
like image 117
Miguel Trejo Avatar answered Sep 19 '25 15:09

Miguel Trejo