I have a matrix like this in NumPy:
array([[0, 0, 1, 1],
[1, 1, 0, 2],
[0, 0, 1, 0],
[0, 2, 1, 1],
[1, 1, 1, 0],
[1, 0, 2, 2]])
I'd like to get the most common value per row. In other words, I'd like to get a vector like this:
array([0, 1, 0, 1, 1, 2])
I managed to solve this problem using Scipy's mode method, in the following way:
scipy.stats.mode(data, axis=1)[0].flatten()
However, I'm looking for a solution which uses NumPy only. Moreover, the solution needs to work with negative integer values as well
Supposing m is the name of your matrix:
most_f = np.array([np.bincount(row).argmax() for row in m])
I hope this solves your question
If your labels are from 0 to n_labels - 1, you can use
labels_onehot = m[..., None] == np.arange(n_labels)[None, None, :] #(n_rows, n_cols, n_labels) one-hot encoded
labels_count = np.count_nonzero(labels_onehot,axis=1) #(n_rows, n_labels), contains the number of occurence of each label in a row
most_frequent = np.argmax(labels_onehot, axis=-1) #(n_rows,) contains the most frequent label
Which is fully vectorized (no list comprehension, no apply_along_axis), so more efficient than the solutions proposed above in terms of speed (and kind of simpler too).
If your labels are not from 0 to n_labels - 1, you can replace np.arange(n_labels) above by an array indexing your labels to get the same result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With