Given an array of integers, I would like to obtain an array of the same size where every value is a number of occurrences of a corresponding element in the original array.
For example given the following array:
a = np.array([1, 1, 4, 10, 5, 3, 5, 5, 8, 9])
This should be the result:
array([2, 2, 1, 1, 3, 1, 3, 3, 1, 1])
Even though it is straightforward to achieve this via collections.Counter
or built-in list.count()
, I'm looking for a more performant way to work with large lists.
You could use np.unique
and use the parameter return_inverse
and return_counts
. Use return_inverse
to index return_counts
to get desired results.
return_inverse bool, optional If True, also return the indices of the unique array (for the specified axis, if provided) that can be used to reconstruct ar.
return_counts bool, optional If True, also return the number of times each unique item appears in ar.
_, idx, c = np.unique(a, return_inverse=True, return_counts=True)
c[idx]
# array([2, 2, 1, 1, 3, 1, 3, 3, 1, 1])
See numpy.bincount:
Count number of occurrences of each value in array of non-negative ints.
Specifically,
np.bincount(a)[a]
will produce your desired result. bincount
produces an array that maps every value 0...amax(a)
to the number of occurrences of those values. You do not want to know how many times did 6 or 7 occur. Instead, it looks like you are interested to have a mapping between input values and the number of occurences. This can be achieved by indexing the output from bincount
with original values from a
.
We get the same result as in the original post:
>>> import numpy as np
>>> a = np.array([1, 1, 4, 10, 5, 3, 5, 5, 8, 9])
>>> np.all(np.equal(np.bincount(a)[a], [2, 2, 1, 1, 3, 1, 3, 3, 1, 1]))
True
This is a modified timing test from https://stackoverflow.com/a/76444102/8033585
import numpy as np
import timeit
TRIALS = 10
a = np.random.randint(1, 1000000, 100000)
def np_bincount_version(a):
b = np.bincount(a)
return np.array([b[item] for item in a])
def np_bincount_this_version(a):
return np.bincount(a)[a]
def np_unique_version(a):
_, idx, c = np.unique(a, return_inverse=True, return_counts=True)
return c[idx]
def np_self_compare_version(a):
return (a[:,None]==a).sum(axis=0)
assert np.all(np.equal(np_bincount_version(a), np_bincount_this_version(a)))
assert (np_bincount_version(a) == np_unique_version(a)).all()
assert (np_unique_version(a) == np_self_compare_version(a)).all()
np_bincount_version_time = timeit.timeit(stmt='np_bincount_version(a)', number=TRIALS, globals=globals())
np_unique_version_time = timeit.timeit(stmt='np_unique_version(a)', number=TRIALS, globals=globals())
np_bincount_this_version_time = timeit.timeit(stmt='np_bincount_this_version(a)', number=TRIALS, globals=globals())
np_self_compare_version_time = timeit.timeit(stmt='np_self_compare_version(a)', number=TRIALS, globals=globals())
min_time = min([np_bincount_version_time, np_bincount_this_version_time, np_unique_version_time, np_self_compare_version_time])
print('Absolute time:')
print(f'Bincount version: {np_bincount_version_time:.2f} seconds per {TRIALS} runs.')
print(f'**This** Bincount version: {np_bincount_this_version_time:.2f} seconds per {TRIALS} runs.')
print(f'Numpy unique version: {np_unique_version_time:.2f} seconds per {TRIALS} runs.')
print(f'Self compare version: {np_self_compare_version_time:.2f} seconds per {TRIALS} runs.')
print()
print('Normalized time:')
print(f'Bincount version: {np_bincount_version_time / min_time:.2f}x')
print(f'**This** Bincount version: {np_bincount_this_version_time / min_time:.2f}x')
print(f'Numpy unique version: {np_unique_version_time / min_time:.2f}x')
print(f'Self compare version: {np_self_compare_version_time / min_time:.2f}x')
Absolute time:
Bincount version: 0.12 seconds per 10 runs.
**This** Bincount version: 0.01 seconds per 10 runs.
Numpy unique version: 0.08 seconds per 10 runs.
Self compare version: 85.21 seconds per 10 runs.
Normalized time:
Bincount version: 16.25x
**This** Bincount version: 1.00x
Numpy unique version: 10.27x
Self compare version: 11627.32x
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With