Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to check if a numpy array is binary (contains only 0 and 1)

Tags:

python

numpy

Given a numpy array, how can I figure it out if it contains only 0 and 1 quickly? Is there any implemented method?

like image 879
SeF Avatar asked Nov 14 '16 19:11

SeF


People also ask

How do I check if a NumPy array has 0?

In numpy, we can check that whether none of the elements of given array is zero or not with the help of numpy. all() function. In this function pass an array as parameter. If any of one element of the passed array is zero then it returns False otherwise it returns True boolean value.

Do NumPy arrays start at 0 or 1?

You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.

How do you check if a NumPy array has an element?

Using Numpy array, we can easily find whether specific values are present or not. For this purpose, we use the “in” operator. “in” operator is used to check whether certain element and values are present in a given sequence and hence return Boolean values 'True” and “False“.

Is NP unique fast?

It is fast implementation.


2 Answers

Few approaches -

((a==0) | (a==1)).all()
~((a!=0) & (a!=1)).any()
np.count_nonzero((a!=0) & (a!=1))==0
a.size == np.count_nonzero((a==0) | (a==1))

Runtime test -

In [313]: a = np.random.randint(0,2,(3000,3000)) # Only 0s and 1s

In [314]: %timeit ((a==0) | (a==1)).all()
     ...: %timeit ~((a!=0) & (a!=1)).any()
     ...: %timeit np.count_nonzero((a!=0) & (a!=1))==0
     ...: %timeit a.size == np.count_nonzero((a==0) | (a==1))
     ...: 
10 loops, best of 3: 28.8 ms per loop
10 loops, best of 3: 29.3 ms per loop
10 loops, best of 3: 28.9 ms per loop
10 loops, best of 3: 28.8 ms per loop

In [315]: a = np.random.randint(0,3,(3000,3000)) # Contains 2 as well

In [316]: %timeit ((a==0) | (a==1)).all()
     ...: %timeit ~((a!=0) & (a!=1)).any()
     ...: %timeit np.count_nonzero((a!=0) & (a!=1))==0
     ...: %timeit a.size == np.count_nonzero((a==0) | (a==1))
     ...: 
10 loops, best of 3: 28 ms per loop
10 loops, best of 3: 27.5 ms per loop
10 loops, best of 3: 29.1 ms per loop
10 loops, best of 3: 28.9 ms per loop

Their runtimes seem to be comparable.

like image 136
Divakar Avatar answered Oct 14 '22 15:10

Divakar


If you have access to Numba (or alternatively cython), you can write something like the following, which will be significantly faster for catching non-binary arrays since it will short circuit the calculation/stop immediately instead of continuing with all of the elements:

import numpy as np
import numba as nb

@nb.njit
def check_binary(x):
    is_binary = True
    for v in np.nditer(x):
        if v.item() != 0 and v.item() != 1:
            is_binary = False
            break

    return is_binary

Running this in pure python without the aid of an accelerator like Numba or Cython makes this approach prohibitively slow.

Timings:

a = np.random.randint(0,2,(3000,3000)) # Only 0s and 1s

%timeit ((a==0) | (a==1)).all()
# 100 loops, best of 3: 15.1 ms per loop

%timeit check_binary(a)
# 100 loops, best of 3: 11.6 ms per loop

a = np.random.randint(0,3,(3000,3000)) # Contains 2 as well

%timeit ((a==0) | (a==1)).all()
# 100 loops, best of 3: 14.9 ms per loop

%timeit check_binary(a)
# 1000000 loops, best of 3: 543 ns per loop
like image 25
JoshAdel Avatar answered Oct 14 '22 13:10

JoshAdel