Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find a number of occurrences of every element of a numpy array?

Tags:

python

numpy

Given an array of integers, I would like to obtain an array of the same size where every value is a number of occurrences of a corresponding element in the original array.

For example given the following array:

a = np.array([1, 1, 4, 10, 5, 3, 5, 5, 8, 9])

This should be the result:

array([2, 2, 1, 1, 3, 1, 3, 3, 1, 1])

Even though it is straightforward to achieve this via collections.Counter or built-in list.count(), I'm looking for a more performant way to work with large lists.

like image 297
Konstantin Kostanzhoglo Avatar asked Oct 20 '25 15:10

Konstantin Kostanzhoglo


2 Answers

You could use np.unique and use the parameter return_inverse and return_counts. Use return_inverse to index return_counts to get desired results.

return_inverse bool, optional If True, also return the indices of the unique array (for the specified axis, if provided) that can be used to reconstruct ar.

return_counts bool, optional If True, also return the number of times each unique item appears in ar.

_, idx, c = np.unique(a, return_inverse=True, return_counts=True)
c[idx]
# array([2, 2, 1, 1, 3, 1, 3, 3, 1, 1])
like image 113
Ch3steR Avatar answered Oct 22 '25 05:10

Ch3steR


See numpy.bincount:

Count number of occurrences of each value in array of non-negative ints.

Specifically,

np.bincount(a)[a]

will produce your desired result. bincount produces an array that maps every value 0...amax(a) to the number of occurrences of those values. You do not want to know how many times did 6 or 7 occur. Instead, it looks like you are interested to have a mapping between input values and the number of occurences. This can be achieved by indexing the output from bincount with original values from a.

Test 1:

We get the same result as in the original post:

>>> import numpy as np

>>> a = np.array([1, 1, 4, 10, 5, 3, 5, 5, 8, 9])

>>> np.all(np.equal(np.bincount(a)[a], [2, 2, 1, 1, 3, 1, 3, 3, 1, 1]))
True

Test 2

This is a modified timing test from https://stackoverflow.com/a/76444102/8033585

import numpy as np
import timeit

TRIALS = 10

a = np.random.randint(1, 1000000, 100000)

def np_bincount_version(a):
    b = np.bincount(a)
    return np.array([b[item] for item in a])

def np_bincount_this_version(a):
    return np.bincount(a)[a]

def np_unique_version(a):
    _, idx, c = np.unique(a, return_inverse=True, return_counts=True)
    return c[idx]

def np_self_compare_version(a):
    return (a[:,None]==a).sum(axis=0)

assert np.all(np.equal(np_bincount_version(a), np_bincount_this_version(a)))
assert (np_bincount_version(a) == np_unique_version(a)).all()
assert (np_unique_version(a)   == np_self_compare_version(a)).all()

np_bincount_version_time = timeit.timeit(stmt='np_bincount_version(a)', number=TRIALS, globals=globals())
np_unique_version_time = timeit.timeit(stmt='np_unique_version(a)', number=TRIALS, globals=globals())
np_bincount_this_version_time = timeit.timeit(stmt='np_bincount_this_version(a)', number=TRIALS, globals=globals())
np_self_compare_version_time = timeit.timeit(stmt='np_self_compare_version(a)', number=TRIALS, globals=globals())

min_time = min([np_bincount_version_time, np_bincount_this_version_time, np_unique_version_time, np_self_compare_version_time])

print('Absolute time:')
print(f'Bincount version: {np_bincount_version_time:.2f} seconds per {TRIALS} runs.')
print(f'**This** Bincount version: {np_bincount_this_version_time:.2f} seconds per {TRIALS} runs.')
print(f'Numpy unique version: {np_unique_version_time:.2f} seconds per {TRIALS} runs.')
print(f'Self compare version: {np_self_compare_version_time:.2f} seconds per {TRIALS} runs.')
print()
print('Normalized time:')
print(f'Bincount version: {np_bincount_version_time / min_time:.2f}x')
print(f'**This** Bincount version: {np_bincount_this_version_time / min_time:.2f}x')
print(f'Numpy unique version: {np_unique_version_time / min_time:.2f}x')
print(f'Self compare version: {np_self_compare_version_time / min_time:.2f}x')

Absolute time:
Bincount version: 0.12 seconds per 10 runs.
**This** Bincount version: 0.01 seconds per 10 runs.
Numpy unique version: 0.08 seconds per 10 runs.
Self compare version: 85.21 seconds per 10 runs.

Normalized time:
Bincount version: 16.25x
**This** Bincount version: 1.00x
Numpy unique version: 10.27x
Self compare version: 11627.32x
like image 40
AGN Gazer Avatar answered Oct 22 '25 05:10

AGN Gazer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!