I am trying to count a number each row shows in a np.array
, for example:
import numpy as np
my_array = np.array([[1, 2, 0, 1, 1, 1],
[1, 2, 0, 1, 1, 1], # duplicate of row 0
[9, 7, 5, 3, 2, 1],
[1, 1, 1, 0, 0, 0],
[1, 2, 0, 1, 1, 1], # duplicate of row 0
[1, 1, 1, 1, 1, 0]])
Row [1, 2, 0, 1, 1, 1]
shows up 3 times.
A simple naive solution would involve converting all my rows to tuples, and applying collections.Counter
, like this:
from collections import Counter
def row_counter(my_array):
list_of_tups = [tuple(ele) for ele in my_array]
return Counter(list_of_tups)
Which yields:
In [2]: row_counter(my_array)
Out[2]: Counter({(1, 2, 0, 1, 1, 1): 3, (1, 1, 1, 1, 1, 0): 1, (9, 7, 5, 3, 2, 1): 1, (1, 1, 1, 0, 0, 0): 1})
However, I am concerned about the efficiency of my approach. And maybe there is a library that provides a built-in way of doing this. I tagged the question as pandas
because I think that pandas
might have the tool I am looking for.
Operator. countOf() is used for counting the number of occurrences of b in a. It counts the number of occurrences of value. It returns the Count of a number of occurrences of value.
NumPy: count() function count() function returns an array with the number of non-overlapping occurrences of substring sub in the range [start, end].
Steps to find the most frequency value in a NumPy array:Create a NumPy array. Apply bincount() method of NumPy to get the count of occurrences of each element in the array. The n, apply argmax() method to get the value having a maximum number of occurrences(frequency).
You can use the answer to this other question of yours to get the counts of the unique items.
In numpy 1.9 there is a return_counts
optional keyword argument, so you can simply do:
>>> my_array
array([[1, 2, 0, 1, 1, 1],
[1, 2, 0, 1, 1, 1],
[9, 7, 5, 3, 2, 1],
[1, 1, 1, 0, 0, 0],
[1, 2, 0, 1, 1, 1],
[1, 1, 1, 1, 1, 0]])
>>> dt = np.dtype((np.void, my_array.dtype.itemsize * my_array.shape[1]))
>>> b = np.ascontiguousarray(my_array).view(dt)
>>> unq, cnt = np.unique(b, return_counts=True)
>>> unq = unq.view(my_array.dtype).reshape(-1, my_array.shape[1])
>>> unq
array([[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 2, 0, 1, 1, 1],
[9, 7, 5, 3, 2, 1]])
>>> cnt
array([1, 1, 3, 1])
In earlier versions, you can do it as:
>>> unq, _ = np.unique(b, return_inverse=True)
>>> cnt = np.bincount(_)
>>> unq = unq.view(my_array.dtype).reshape(-1, my_array.shape[1])
>>> unq
array([[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 2, 0, 1, 1, 1],
[9, 7, 5, 3, 2, 1]])
>>> cnt
array([1, 1, 3, 1])
I think just specifying axis
in np.unique
gives what you need.
import numpy as np
unq, cnt = np.unique(my_array, axis=0, return_counts=True)
Note: this feature is available only in numpy>=1.13.0
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With