How can i vectorize list using sklearn DictVectorizer

Question

I found next example on sklearn docs site:

>>> measurements = [
...     {'city': 'Dubai', 'temperature': 33.},
...     {'city': 'London', 'temperature': 12.},
...     {'city': 'San Fransisco', 'temperature': 18.},
... ]

>>> from sklearn.feature_extraction import DictVectorizer
>>> vec = DictVectorizer()

>>> vec.fit_transform(measurements).toarray()
array([[  1.,   0.,   0.,  33.],
       [  0.,   1.,   0.,  12.],
       [  0.,   0.,   1.,  18.]])

>>> vec.get_feature_names()
['city=Dubai', 'city=London', 'city=San Fransisco', 'temperature']

And i need to vectorize dict that looks like:

>>> measurements = [
...     {'city': ['Dubai','London'], 'temperature': 33.},
...     {'city': ['London','San Fransisco'], 'temperature': 12.},
...     {'city': ['San Fransisco'], 'temperature': 18.},
... ]

to get next result:

array([[  1.,   1.,   0.,  33.],
       [  0.,   1.,   1.,  12.],
       [  0.,   0.,   1.,  18.]])

I mean the value of dict should be a list (or tuple etc).

Can i do this using DictVectorizer or in any other way?

Fred Foo · Accepted Answer

Change the representation to

>>> measurements = [
...     {'city=Dubai': True, 'city=London': True, 'temperature': 33.},
...     {'city=London': True, 'city=San Fransisco': True, 'temperature': 12.},
...     {'city': 'San Fransisco', 'temperature': 18.},
... ]

Then the result is exactly as you expect:

>>> vec.fit_transform(measurements).toarray()
array([[  1.,   1.,   0.,  33.],
       [  0.,   1.,   1.,  12.],
       [  0.,   0.,   1.,  18.]])

How can i vectorize list using sklearn DictVectorizer

Tags:

python

scikit-learn

fi11er

1 Answers

Fred Foo

Recent Activity

Donate For Us

How can i vectorize list using sklearn DictVectorizer

Tags:

python

scikit-learn

fi11er

1 Answers

Fred Foo

Related questions

Recent Activity

Donate For Us