Numpy

Question

I have a matrix like this in NumPy:

array([[0, 0, 1, 1],
       [1, 1, 0, 2],
       [0, 0, 1, 0],
       [0, 2, 1, 1],
       [1, 1, 1, 0],
       [1, 0, 2, 2]])

I'd like to get the most common value per row. In other words, I'd like to get a vector like this:

array([0, 1, 0, 1, 1, 2])

I managed to solve this problem using Scipy's mode method, in the following way:

scipy.stats.mode(data, axis=1)[0].flatten()

However, I'm looking for a solution which uses NumPy only. Moreover, the solution needs to work with negative integer values as well

Borja_042 · Accepted Answer

Supposing m is the name of your matrix:

most_f = np.array([np.bincount(row).argmax() for row in m])

I hope this solves your question

tbrugere · Answer

If your labels are from 0 to n_labels - 1, you can use

labels_onehot = m[..., None] == np.arange(n_labels)[None, None, :] #(n_rows, n_cols, n_labels) one-hot encoded
labels_count = np.count_nonzero(labels_onehot,axis=1)              #(n_rows, n_labels), contains the number of occurence of each label in a row
most_frequent = np.argmax(labels_onehot, axis=-1)                  #(n_rows,) contains the most frequent label

Which is fully vectorized (no list comprehension, no apply_along_axis), so more efficient than the solutions proposed above in terms of speed (and kind of simpler too).

If your labels are not from 0 to n_labels - 1, you can replace np.arange(n_labels) above by an array indexing your labels to get the same result.

Numpy - find most common item per row

Tags:

python

scipy

David Lasry

2 Answers

Borja_042

tbrugere

Recent Activity

Donate For Us