Count unique elements along an axis of a NumPy array

Tags:

I have a three-dimensional array like

A=np.array([[[1,1],
[1,0]],

[[1,2],
[1,0]],

[[1,0],
[0,0]]])

Now I would like to obtain an array that has a nonzero value in a given position if only a unique nonzero value (or zero) occurs in that position. It should have zero if only zeros or more than one nonzero value occur in that position. For the example above, I would like

[[1,0],
[1,0]]

since

in A[:,0,0] there are only 1s
in A[:,0,1] there are 0, 1 and 2, so more than one nonzero value
in A[:,1,0] there are 0 and 1, so 1 is retained
in A[:,1,1] there are only 0s

I can find how many nonzero elements there are with np.count_nonzero(A, axis=0), but I would like to keep 1s or 2s even if there are several of them. I looked at np.unique but it doesn't seem to support what I'd like to do.

Ideally, I'd like a function like np.count_unique(A, axis=0) which would return an array in the original shape, e.g. [[1, 3],[2, 1]], so I could check whether 3 or more occur and then ignore that position.

All I could come up with was a list comprehension iterating over the that I'd like to obtain

[[len(np.unique(A[:, i, j])) for j in range(A.shape[2])] for i in range(A.shape[1])]

Any other ideas?

734

asked Oct 23 '17 15:10

Raketenolli

2 Answers

You can use np.diff to stay at numpy level for the second task.

def diffcount(A):
    B=A.copy()
    B.sort(axis=0)
    C=np.diff(B,axis=0)>0
    D=C.sum(axis=0)+1
    return D

# [[1 3]
#  [2 1]]

it's seems to be a little faster on big arrays:

In [62]: A=np.random.randint(0,100,(100,100,100))

In [63]: %timeit diffcount(A)
46.8 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [64]: timeit [[len(np.unique(A[:, i, j])) for j in range(A.shape[2])]\
for i in range(A.shape[1])]
149 ms ± 700 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Finally counting unique is simpler than sorting, a ln(A.shape[0]) factor can be win.

A way to win this factor is to use the set mechanism :

In [81]: %timeit np.apply_along_axis(lambda a:len(set(a)),axis=0,A) 
183 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Unfortunately, this is not faster.

Another way is to do it by hand :

def countunique(A,Amax):
    res=np.empty(A.shape[1:],A.dtype)
    c=np.empty(Amax+1,A.dtype)
    for i in range(A.shape[1]):
        for j in range(A.shape[2]):
            T=A[:,i,j]
            for k in range(c.size): c[k]=0 
            for x in T:
                c[x]=1
            res[i,j]= c.sum()
    return res

At python level:

In [70]: %timeit countunique(A,100)
429 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Which is not so bad for a pure python approach. Then just shift this code at low level with numba :

import numba    
countunique2=numba.jit(countunique)  

In [71]: %timeit countunique2(A,100)
3.63 ms ± 70.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Which will be difficult to improve a lot.

116

answered Oct 19 '22 06:10

B. M.

One approach would be to use A as first axis indices for setting a boolean array of the same lengths along the other two axes and then simply counting the non-zeros along the first axis of it. Two variants would be possible - One keeping it as 3D and another would be to reshape into 2D for some performance benefit as indexing into 2D would be faster. Thus, the two implementations would be -

def nunique_axis0_maskcount_app1(A):
    m,n = A.shape[1:]
    mask = np.zeros((A.max()+1,m,n),dtype=bool)
    mask[A,np.arange(m)[:,None],np.arange(n)] = 1
    return mask.sum(0)

def nunique_axis0_maskcount_app2(A):
    m,n = A.shape[1:]
    A.shape = (-1,m*n)
    maxn = A.max()+1
    N = A.shape[1]
    mask = np.zeros((maxn,N),dtype=bool)
    mask[A,np.arange(N)] = 1
    A.shape = (-1,m,n)
    return mask.sum(0).reshape(m,n)

Runtime test -

In [154]: A = np.random.randint(0,100,(100,100,100))

# @B. M.'s soln
In [155]: %timeit f(A)
10 loops, best of 3: 28.3 ms per loop

# @B. M.'s soln using slicing : (B[1:] != B[:-1]).sum(0)+1
In [156]: %timeit f2(A)
10 loops, best of 3: 26.2 ms per loop

In [157]: %timeit nunique_axis0_maskcount_app1(A)
100 loops, best of 3: 12 ms per loop

In [158]: %timeit nunique_axis0_maskcount_app2(A)
100 loops, best of 3: 9.14 ms per loop

Numba method

Using the same strategy as used for nunique_axis0_maskcount_app2 with directly getting the counts at C-level with numba, we would have -

from numba import njit

@njit
def nunique_loopy_func(mask, N, A, p, count):
    for j in range(N):
        mask[:] = True
        mask[A[0,j]] = False
        c = 1
        for i in range(1,p):
            if mask[A[i,j]]:
                c += 1
            mask[A[i,j]] = False
        count[j] = c
    return count

def nunique_axis0_numba(A):
    p,m,n = A.shape
    A.shape = (-1,m*n)
    maxn = A.max()+1
    N = A.shape[1]
    mask = np.empty(maxn,dtype=bool)
    count = np.empty(N,dtype=int)
    out = nunique_loopy_func(mask, N, A, p, count).reshape(m,n)
    A.shape = (-1,m,n)
    return out

Runtime test -

In [328]: np.random.seed(0)

In [329]: A = np.random.randint(0,100,(100,100,100))

In [330]: %timeit nunique_axis0_maskcount_app2(A)
100 loops, best of 3: 11.1 ms per loop

# @B.M.'s numba soln
In [331]: %timeit countunique2(A,A.max()+1)
100 loops, best of 3: 3.43 ms per loop

# Numba soln posted in this post
In [332]: %timeit nunique_axis0_numba(A)
100 loops, best of 3: 2.76 ms per loop

answered Oct 19 '22 06:10

Divakar

Related questions
                            
                                Why does creating a datetime with a tzinfo from pytz show a weird time offset?
                            
                                Exporting jupyter notebook to pdf with offline plotly graph; missing graphs
                            
                                Installing packages from a list using pip
                            
                                How to increase iterations for scipy.optimize.linprog function in python?
                            
                                How to verify structure a neural network in keras model?
                            
                                Zappa not packaging nested source directories
                            
                                return the index using pandas series.sample()?
                            
                                Python program outputting different results, even though no random is used
                            
                                How do you append the values of the first column to all other columns in a pandas dataframe
                            
                                Using Python Selenium Webdriver to open Electron Application
                            
                                How to get original values after using factorize() in Python?
                            
                                Anaconda Prompt Corrupts after Installation
                            
                                Why does `head` need `()` and `shape` does not?
                            
                                Python PCA plot using Hotelling's T2 for a confidence interval
                            
                                How to create custom transport for asyncio?
                            
                                Python: Comparing two JSON objects in pytest
                            
                                Fastest way to loop over Pandas DataFrame for API calls
                            
                                Python/pandas - Using DataFrame.apply with function returning dictionary
                            
                                Lambda not supporting NLTK file size
                            
                                Getting Labels on top of Bar in Polar/Radial Bar Chart in Matplotlib, Python3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count unique elements along an axis of a NumPy array

Tags:

python

arrays

numpy

Raketenolli

People also ask

2 Answers

B. M.

Divakar

Recent Activity

Donate For Us