TLDR: How to get headers for the output numpy array from the sklearn.preprocessing.PolynomialFeatures() function?
Let's say I have the following code...
import pandas as pd
import numpy as np
from sklearn import preprocessing as pp
a = np.ones(3)
b = np.ones(3) * 2
c = np.ones(3) * 3
input_df = pd.DataFrame([a,b,c])
input_df = input_df.T
input_df.columns=['a', 'b', 'c']
input_df
a b c
0 1 2 3
1 1 2 3
2 1 2 3
poly = pp.PolynomialFeatures(2)
output_nparray = poly.fit_transform(input_df)
print output_nparray
[[ 1. 1. 2. 3. 1. 2. 3. 4. 6. 9.]
[ 1. 1. 2. 3. 1. 2. 3. 4. 6. 9.]
[ 1. 1. 2. 3. 1. 2. 3. 4. 6. 9.]]
How can I get that 3x10 matrix/ output_nparray to carry over the a,b,c labels how they relate to the data above?
scikit-learn 0.18 added a nifty get_feature_names()
method!
>> input_df.columns
Index(['a', 'b', 'c'], dtype='object')
>> poly.fit_transform(input_df)
array([[ 1., 1., 2., 3., 1., 2., 3., 4., 6., 9.],
[ 1., 1., 2., 3., 1., 2., 3., 4., 6., 9.],
[ 1., 1., 2., 3., 1., 2., 3., 4., 6., 9.]])
>> poly.get_feature_names(input_df.columns)
['1', 'a', 'b', 'c', 'a^2', 'a b', 'a c', 'b^2', 'b c', 'c^2']
Note you have to provide it with the columns names, since sklearn doesn't read it off from the DataFrame by itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With