Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy - find most common item per row

I have a matrix like this in NumPy:

array([[0, 0, 1, 1],
       [1, 1, 0, 2],
       [0, 0, 1, 0],
       [0, 2, 1, 1],
       [1, 1, 1, 0],
       [1, 0, 2, 2]])

I'd like to get the most common value per row. In other words, I'd like to get a vector like this:

array([0, 1, 0, 1, 1, 2])

I managed to solve this problem using Scipy's mode method, in the following way:

scipy.stats.mode(data, axis=1)[0].flatten()

However, I'm looking for a solution which uses NumPy only. Moreover, the solution needs to work with negative integer values as well

like image 880
David Lasry Avatar asked Oct 29 '25 02:10

David Lasry


2 Answers

Supposing m is the name of your matrix:

most_f = np.array([np.bincount(row).argmax() for row in m])

I hope this solves your question

like image 143
Borja_042 Avatar answered Oct 31 '25 17:10

Borja_042


If your labels are from 0 to n_labels - 1, you can use

labels_onehot = m[..., None] == np.arange(n_labels)[None, None, :] #(n_rows, n_cols, n_labels) one-hot encoded
labels_count = np.count_nonzero(labels_onehot,axis=1)              #(n_rows, n_labels), contains the number of occurence of each label in a row
most_frequent = np.argmax(labels_onehot, axis=-1)                  #(n_rows,) contains the most frequent label

Which is fully vectorized (no list comprehension, no apply_along_axis), so more efficient than the solutions proposed above in terms of speed (and kind of simpler too).

If your labels are not from 0 to n_labels - 1, you can replace np.arange(n_labels) above by an array indexing your labels to get the same result.

like image 36
tbrugere Avatar answered Oct 31 '25 17:10

tbrugere



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!