I'm wondering if anyone knows how to vectorize feature hashing in Python. For example, this is my code:
import numpy as np
hashlen = 5
x = np.array([4, 7, 4, 2, 6, 8, 0, 6, 3, 1])
h = np.array([0, 3, 1, 2, 4, 2, 1, 0, 3, 1])
In feature hashing, h represents the indices of the new vector I am hashing x to, i.e the index 0 of the hashed vector should have 4 and 6 summed up, index 1 should have 4, 0 and 1 summed up, etc. The resulting hashed vector should be:
w = np.array([ 10, 5, 10, 10, 6])
One way of doing this is of course by looping through the hash indices, i.e:
for itr in range(hashlen):
w[itr] = np.sum(x[np.where(h==itr)])
For large vectors, the complexity is a function of hashlen (the length of the hashed vector). It could take too long, especially with a np.where() in it.
I want to do something like:
w = np.zeros(hashlen)
w[h]+= x
However, the result of this is the same as doing
w = np.zeros(hashlen)
w[h] = x
Can anyone let me know if I'm missing something here? Or if there's an 'easy' way of doing the feature hashing that doesn't involve too many computations?
You can use bincount with weights to do what you are asking:
>>> np.bincount(h,weights=x)
array([ 10., 5., 10., 10., 6.])
For matrices:
>>> import numpy as np
>>> a=np.random.randint(0,5,(50,50))
>>> rand=np.random.rand(5)
>>> rand
array([ 0.10899745, 0.35296303, 0.21127571, 0.56433924, 0.27895281])
>>> b=np.take(rand,a)
#Unfortunately you cannot do it like this:
>>> np.bincount(a,weights=b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: object too deep for desired array
#There we go:
>>> np.bincount(a.flat,weights=b.flat)
array([ 55.04371257, 172.59892108, 96.34172236, 297.40677707,
145.89232039])
This used fancy indexing to see what was happening:
>>> np.bincount(a.flat)
array([505, 489, 456, 527, 523])
>>> np.bincount(a.flat)*rand
array([ 55.04371257, 172.59892108, 96.34172236, 297.40677707,
145.89232039])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With