Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy: most efficient frequency counts for unique values in an array

How do I efficiently obtain the frequency count for each unique value in a NumPy array?

>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> freq_count(x)
[(1, 5), (2, 3), (5, 1), (25, 1)]
like image 236
Abe Avatar asked Oct 04 '22 11:10

Abe


People also ask

How do you count the frequency of unique values in a NumPy array?

To count each unique element's number of occurrences in the numpy array, we can use the numpy. unique() function. It takes the array as an input argument and returns all the unique elements inside the array in ascending order.

How do I find the most frequent value in a NumPy array?

Steps to find the most frequency value in a NumPy array:Create a NumPy array. Apply bincount() method of NumPy to get the count of occurrences of each element in the array. The n, apply argmax() method to get the value having a maximum number of occurrences(frequency).

How do you find unique elements in a NumPy array?

unique() function. The unique() function is used to find the unique elements of an array. Returns the sorted unique elements of an array.


2 Answers

Use numpy.unique with return_counts=True (for NumPy 1.9+):

import numpy as np

x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

>>> print(np.asarray((unique, counts)).T)
 [[ 1  5]
  [ 2  3]
  [ 5  1]
  [25  1]]

In comparison with scipy.stats.itemfreq:

In [4]: x = np.random.random_integers(0,100,1e6)

In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop

In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop
like image 689
jme Avatar answered Oct 07 '22 00:10

jme


Take a look at np.bincount:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]

And then:

zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

or:

np.vstack((ii,y[ii])).T
# array([[ 1,  5],
         [ 2,  3],
         [ 5,  1],
         [25,  1]])

or however you want to combine the counts and the unique values.

like image 196
JoshAdel Avatar answered Oct 06 '22 23:10

JoshAdel