How to use sklearn FeatureHasher?

Question

I have a dataframe like this:

import pandas as pd
test = pd.DataFrame({'type': ['a', 'b', 'a', 'c', 'b'], 'model': ['bab', 'ba', 'ba', 'ce', 'bw']})

How do I use the sklearn FeatureHasher on it?

I tried:

from sklearn.feature_extraction import FeatureHasher 
FH = FeatureHasher()
train = FH.transform(test.type)

but it doesn't like it? it seems it wants a string or a list so I try

FH.transform(test.to_dict(orient='list'))

but that doesn't work either? I get:

AttributeError: 'str' object has no attribute 'items'

thanks

Julien Marrec · Accepted Answer

You need to specify the input type when initializing your instance of FeatureHasher:

In [1]:
from sklearn.feature_extraction import FeatureHasher
h = FeatureHasher(n_features=5, input_type='string')
f = h.transform(test.type)
f.toarray()

Out[1]:
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0., -1.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0.]])

Note that this will assume the value of these feature is 1 according to the documentation linked above (bold emphasis is mine):

input_type : string, optional, default “dict”

Either “dict” (the default) to accept dictionaries over (feature_name, value);

“pair” to accept pairs of (feature_name, value);

or “string” to accept single strings. feature_name should be a string, while value should be a number. In the case of “string”, a value of 1 is implied.

The feature_name is hashed to find the appropriate column for the feature. The value’s sign might be flipped in the output (but see non_negative, below).

How to use sklearn FeatureHasher?

Tags:

python

pandas

scikit-learn

KillerSnail

1 Answers

Julien Marrec

Recent Activity

Donate For Us

How to use sklearn FeatureHasher?

Tags:

python

pandas

scikit-learn

KillerSnail

1 Answers

Julien Marrec

Related questions

Recent Activity

Donate For Us