Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sklearn preprocessing - PolynomialFeatures - How to keep column names/headers of the output array / dataframe

TLDR: How to get headers for the output numpy array from the sklearn.preprocessing.PolynomialFeatures() function?


Let's say I have the following code...

import pandas as pd
import numpy as np
from sklearn import preprocessing as pp

a = np.ones(3)
b = np.ones(3) * 2
c = np.ones(3) * 3

input_df = pd.DataFrame([a,b,c])
input_df = input_df.T
input_df.columns=['a', 'b', 'c']

input_df

    a   b   c
0   1   2   3
1   1   2   3
2   1   2   3

poly = pp.PolynomialFeatures(2)
output_nparray = poly.fit_transform(input_df)
print output_nparray

[[ 1.  1.  2.  3.  1.  2.  3.  4.  6.  9.]
 [ 1.  1.  2.  3.  1.  2.  3.  4.  6.  9.]
 [ 1.  1.  2.  3.  1.  2.  3.  4.  6.  9.]]

How can I get that 3x10 matrix/ output_nparray to carry over the a,b,c labels how they relate to the data above?

like image 227
Afflatus Avatar asked Apr 19 '16 20:04

Afflatus


1 Answers

scikit-learn 0.18 added a nifty get_feature_names() method!

>> input_df.columns
Index(['a', 'b', 'c'], dtype='object')

>> poly.fit_transform(input_df)
array([[ 1.,  1.,  2.,  3.,  1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  1.,  2.,  3.,  1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  1.,  2.,  3.,  1.,  2.,  3.,  4.,  6.,  9.]])

>> poly.get_feature_names(input_df.columns)
['1', 'a', 'b', 'c', 'a^2', 'a b', 'a c', 'b^2', 'b c', 'c^2']

Note you have to provide it with the columns names, since sklearn doesn't read it off from the DataFrame by itself.

like image 60
OmerB Avatar answered Sep 19 '22 10:09

OmerB