Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use sklearn FeatureHasher?

I have a dataframe like this:

import pandas as pd
test = pd.DataFrame({'type': ['a', 'b', 'a', 'c', 'b'], 'model': ['bab', 'ba', 'ba', 'ce', 'bw']})

How do I use the sklearn FeatureHasher on it?

I tried:

from sklearn.feature_extraction import FeatureHasher 
FH = FeatureHasher()
train = FH.transform(test.type)

but it doesn't like it? it seems it wants a string or a list so I try

FH.transform(test.to_dict(orient='list'))

but that doesn't work either? I get:

AttributeError: 'str' object has no attribute 'items'

thanks

like image 910
KillerSnail Avatar asked Nov 22 '16 10:11

KillerSnail


1 Answers

You need to specify the input type when initializing your instance of FeatureHasher:

In [1]:
from sklearn.feature_extraction import FeatureHasher
h = FeatureHasher(n_features=5, input_type='string')
f = h.transform(test.type)
f.toarray()

Out[1]:
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0., -1.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0.]])

Note that this will assume the value of these feature is 1 according to the documentation linked above (bold emphasis is mine):

input_type : string, optional, default “dict”

  • Either “dict” (the default) to accept dictionaries over (feature_name, value);
  • “pair” to accept pairs of (feature_name, value);
  • or “string” to accept single strings. feature_name should be a string, while value should be a number. In the case of “string”, a value of 1 is implied.

The feature_name is hashed to find the appropriate column for the feature. The value’s sign might be flipped in the output (but see non_negative, below).

like image 156
Julien Marrec Avatar answered Oct 12 '22 22:10

Julien Marrec