Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorizing this non-unique-key operation

I have a non-unique original data called test. Using this input, I want to create an output vector together with a set of rows that get non-zero output, and the data, that contains their output.

import numpy as np

rows = np.array([3, 4])
test = np.array([1, 3, 3, 4, 5])
data = np.array([-1, 2])

My expected output is a vector of shape test.shape.

Each element in output:

  • if element is in rows with index i, output[i] = data[i]
  • otherwise, output[i] = 0

In other words, the following generates my output.

output = np.zeros(test.shape)
for i, val in enumerate(rows):
    output[test == val] = data[i]

Is there any way of vectorizing this?

like image 920
FooBar Avatar asked Jun 13 '26 00:06

FooBar


1 Answers

Here's a vectorized approach based upon searchsorted -

# Get sorted index positions
idx = np.searchsorted(rows, test)

# Set out-of-bounds(invalid ones) to some dummy index, say 0
idx[idx==len(rows)] = 0

# Get invalid mask array found out by indexing data array
# with those indices and looking for matches
invalid_mask = rows[idx] != test

# Get data indexed array as output and set invalid places with 0s
out = data[idx]
out[invalid_mask] = 0

Last couple of lines could have two alternatives, if you dig one-liners -

out = data[idx] * (rows[idx] == test) # skips using `invalid_mask`

out = np.where(invalid_mask, 0, data[idx])
like image 180
Divakar Avatar answered Jun 15 '26 14:06

Divakar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!