I am trying to use numpy optimized in-built functions to generate thermometer encoding. Thermometer encoding is basically generating n amount if 1's in a given length. For example in 8-length, 3 will be encoded as:
1 1 1 0 0 0 0 0
Using numpy to generate that vector based on a integer input is basically slicing and setting 1.
stream[:num_ones] = 1
So my question is given a vector as input what will be best way to generate a matrix output for instance:
[2 3 4 1]
as input should produce:
[[1 1 0 0 0 0 0 0],
[1 1 1 0 0 0 0 0],
[1 1 1 1 0 0 0 0],
[1 0 0 0 0 0 0 0]]
My current solution is iterating over the a zero matrix of required size and setting the required number of elements to 1 using the slicing method I wrote above. Is there a faster way for me to do this?
I'd never heard of "thermometer encoding" before, but when you realise how it's so similar to one-hot encoding, it becomes clear you can get there using bit shift ops:
>>> a = np.array([2, 3, 4, 1], dtype=np.uint8)
>>> print(np.fliplr(np.unpackbits((1 << a) - 1).reshape(-1,8)))
[[1 1 0 0 0 0 0 0]
[1 1 1 0 0 0 0 0]
[1 1 1 1 0 0 0 0]
[1 0 0 0 0 0 0 0]]
Edit: You can generalise the idea to arbitrary size integers by working in 8 column chunks:
a = np.array([2, 13, 4, 0, 1, 17], dtype=np.uint8)
out = np.empty((len(a), 0), dtype=np.uint8)
while a.any():
block = np.fliplr(np.unpackbits((1 << a) - 1).reshape(-1,8))
out = np.concatenate([out, block], axis=1)
a = np.where(a<8, 0, a-8)
print(out)
[[1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With