I am following the solution given here.
But the solution takes argmax() features from each Principal Component. I want to take the top three. How do I go about it?
I basically want to know which features have maximum impact on each of the PCs, respectively.
Thank you.
You could get the sorted index by using np.argsort or np.argpartition. Following the procedure of the question indicated
# With argsort
most_important = [np.argsort(np.abs(model.components_[i]))[::-1][:3] for i in range(n_pcs)]
# With argpartition
most_important = [np.argpartition(np.abs(model.components_[i]), -3)[-3:] for i in range(n_pcs)]
most_important
>>> [array([4, 1, 0]), array([2, 3, 4])]
then to get the most important components as columns
initial_feature_names = ['a','b','c','d','e']
# Notices the [::-1] is used to order the component names
most_important_names = [[initial_feature_names[i] for i in most_important[i][::-1]] for i in range(n_pcs)]
dic = {'PC{}'.format(i): most_important_names[i] for i in range(n_pcs)}
pd.DataFrame.from_dict(dic).T
>>>
0 1 2
PC0 e b a
PC1 c d e
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With