numpy select fixed amount of values among duplicate values in array

Question

Starting from a simple array with duplicate values:

a = np.array([2,3,2,2,3,3,2,1])

I'm trying to select a maximum of 2 unique values from this. The resulting array would appear as:

b = np.array([2,3,2,3,1])

no matter the order of the items. So far I tried to find unique values with:

In [20]: c = np.unique(a,return_counts=True)

In [21]: c
Out[21]: (array([1, 2, 3]), array([1, 4, 3]))

which is useful because it returns the frequency of values as well, but I'm stucked in filtering by frequency.

unutbu · Accepted Answer

You could use np.repeat to generate the desired array from the array of uniques and counts:

import numpy as np

a = np.array([2,3,2,2,3,3,2,1])
uniques, count = np.unique(a,return_counts=True)
np.repeat(uniques, np.clip(count, 0, 2))

yields

array([1, 2, 2, 3, 3])

np.clip is used to force all values in count to be between 0 and 2. Thus, you get at most two values for each unique value.

Mazdak · Answer

You can use a list comprehension within np.concatenate() and limit the number of items by slicing:

>>> np.concatenate([a[a==i][:2] for i in np.unique(a)])
array([1, 2, 2, 3, 3])

Donate For Us