Count how many times each row is present in numpy.array

Tags:

I am trying to count a number each row shows in a np.array, for example:

import numpy as np
my_array = np.array([[1, 2, 0, 1, 1, 1],
                     [1, 2, 0, 1, 1, 1], # duplicate of row 0
                     [9, 7, 5, 3, 2, 1],
                     [1, 1, 1, 0, 0, 0], 
                     [1, 2, 0, 1, 1, 1], # duplicate of row 0
                     [1, 1, 1, 1, 1, 0]])

Row [1, 2, 0, 1, 1, 1] shows up 3 times.

A simple naive solution would involve converting all my rows to tuples, and applying collections.Counter, like this:

from collections import Counter
def row_counter(my_array):
    list_of_tups = [tuple(ele) for ele in my_array]
    return Counter(list_of_tups)

Which yields:

In [2]: row_counter(my_array)
Out[2]: Counter({(1, 2, 0, 1, 1, 1): 3, (1, 1, 1, 1, 1, 0): 1, (9, 7, 5, 3, 2, 1): 1, (1, 1, 1, 0, 0, 0): 1})

However, I am concerned about the efficiency of my approach. And maybe there is a library that provides a built-in way of doing this. I tagged the question as pandas because I think that pandas might have the tool I am looking for.

629

asked Nov 18 '14 17:11

Akavall

2 Answers

You can use the answer to this other question of yours to get the counts of the unique items.

In numpy 1.9 there is a return_counts optional keyword argument, so you can simply do:

>>> my_array
array([[1, 2, 0, 1, 1, 1],
       [1, 2, 0, 1, 1, 1],
       [9, 7, 5, 3, 2, 1],
       [1, 1, 1, 0, 0, 0],
       [1, 2, 0, 1, 1, 1],
       [1, 1, 1, 1, 1, 0]])
>>> dt = np.dtype((np.void, my_array.dtype.itemsize * my_array.shape[1]))
>>> b = np.ascontiguousarray(my_array).view(dt)
>>> unq, cnt = np.unique(b, return_counts=True)
>>> unq = unq.view(my_array.dtype).reshape(-1, my_array.shape[1])
>>> unq
array([[1, 1, 1, 0, 0, 0],
       [1, 1, 1, 1, 1, 0],
       [1, 2, 0, 1, 1, 1],
       [9, 7, 5, 3, 2, 1]])
>>> cnt
array([1, 1, 3, 1])

In earlier versions, you can do it as:

>>> unq, _ = np.unique(b, return_inverse=True)
>>> cnt = np.bincount(_)
>>> unq = unq.view(my_array.dtype).reshape(-1, my_array.shape[1])
>>> unq
array([[1, 1, 1, 0, 0, 0],
       [1, 1, 1, 1, 1, 0],
       [1, 2, 0, 1, 1, 1],
       [9, 7, 5, 3, 2, 1]])
>>> cnt
array([1, 1, 3, 1])

answered Sep 30 '22 11:09

Jaime

I think just specifying axis in np.unique gives what you need.

import numpy as np
unq, cnt = np.unique(my_array, axis=0, return_counts=True)

Note: this feature is available only in numpy>=1.13.0.

answered Sep 30 '22 12:09

Yuya Takashina

Related questions
                            
                                How to set Python version by default in FreeBSD?
                            
                                S3 boto list keys sometimes returns directory key
                            
                                Python inequalities: != vs not ==
                            
                                How to apply function to elements of a list?
                            
                                How to truncate all strings in a list to a same length, in some pythonic way?
                            
                                Adding an attribute to a Python dictionary from the standard library
                            
                                Format number using LaTeX notation in Python
                            
                                Pandas installation on Mac OS X: ImportError (cannot import name hashtable)
                            
                                How to send a “multipart/related” with requests in python?
                            
                                Argv - String into Integer
                            
                                Passing image object as a button background in Kivy
                            
                                how to get dict of model objects keyed by field
                            
                                Python BeautifulSoup findAll by "class" attribute
                            
                                SqlAlchemy update not working with Sqlite
                            
                                Python sklearn - how to calculate p-values
                            
                                How to enable python repl autocomplete and still allow new line tabs
                            
                                How to store a Python dictionary as an Environment Variable
                            
                                How to return data with 403 error in Django Rest Framework?
                            
                                subprocess call ffmpeg (command line)
                            
                                Where is Qt designer app on Mac + Anaconda?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count how many times each row is present in numpy.array

Tags:

python

arrays

pandas

numpy

Akavall

People also ask

2 Answers

Jaime

Yuya Takashina

Recent Activity

Donate For Us