I have a dataframe like this:
import pandas as pd
test = pd.DataFrame({'type': ['a', 'b', 'a', 'c', 'b'], 'model': ['bab', 'ba', 'ba', 'ce', 'bw']})
How do I use the sklearn
FeatureHasher
on it?
I tried:
from sklearn.feature_extraction import FeatureHasher
FH = FeatureHasher()
train = FH.transform(test.type)
but it doesn't like it? it seems it wants a string or a list so I try
FH.transform(test.to_dict(orient='list'))
but that doesn't work either? I get:
AttributeError: 'str' object has no attribute 'items'
thanks
You need to specify the input type when initializing your instance of FeatureHasher:
In [1]:
from sklearn.feature_extraction import FeatureHasher
h = FeatureHasher(n_features=5, input_type='string')
f = h.transform(test.type)
f.toarray()
Out[1]:
array([[ 1., 0., 0., 0., 0.],
[ 0., -1., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., -1., 0., 0.],
[ 0., -1., 0., 0., 0.]])
Note that this will assume the value of these feature is 1 according to the documentation linked above (bold emphasis is mine):
input_type : string, optional, default “dict”
- Either “dict” (the default) to accept dictionaries over (feature_name, value);
- “pair” to accept pairs of (feature_name, value);
- or “string” to accept single strings. feature_name should be a string, while value should be a number. In the case of “string”, a value of 1 is implied.
The feature_name is hashed to find the appropriate column for the feature. The value’s sign might be flipped in the output (but see non_negative, below).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With