numpy vectorized way to count non-zero bits in array of integers

Question

I have an array of integers:

[int1, int2, ..., intn]

I want to count how many non-zero bits are in the binary representation of these integers.

For example:

bin(123) -> 0b1111011, there are 6 non-zero bits

Of course I can loop over list of integers, use bin() and count('1') functions, but I'm looking for vectorized way to do it.

Ehsan · Accepted Answer

Assuming your array is a, you can simply do:

np.unpackbits(a.view('uint8')).sum()

example:

a = np.array([123, 44], dtype=np.uint8)
#bin(a) is [0b1111011, 0b101100]
np.unpackbits(a.view('uint8')).sum()
#9

Comparison using benchit:

#@Ehsan's solution
def m1(a):
  return np.unpackbits(a.view('uint8')).sum()

#@Valdi_Bo's solution
def m2(a):
  return sum([ bin(n).count('1') for n in a ])

in_ = [np.random.randint(100000,size=(n)) for n in [10,100,1000,10000,100000]]

m1 is significantly faster.

enter image description here

Mad Physicist · Answer

Since numpy, unlike python, has limited integer sizes, you can adapt the bit-twiddling solution proposed by Óscar López to Fast way of counting non-zero bits in positive integer (originally from here) for a portable, fast solution:

def bit_count(arr):
     # Make the values type-agnostic (as long as it's integers)
     t = arr.dtype.type
     mask = t(-1)
     s55 = t(0x5555555555555555 & mask)  # Add more digits for 128bit support
     s33 = t(0x3333333333333333 & mask)
     s0F = t(0x0F0F0F0F0F0F0F0F & mask)
     s01 = t(0x0101010101010101 & mask)

     arr = arr - ((arr >> 1) & s55)
     arr = (arr & s33) + ((arr >> 2) & s33)
     arr = (arr + (arr >> 4)) & s0F
     return (arr * s01) >> (8 * (arr.itemsize - 1))

The first part of the function truncates the quantities 0x5555..., 0x3333..., etc to the integer type that arr actually consists of. The remainder just does a set of bit-twiddling operations.

This function is about 4.5x faster than Ehsan's method and about 60x faster than Valdi Bo's for an array of 100000 elements:

a = np.random.randint(0, 0xFFFFFFFF, size=100000, dtype=np.uint32)
%timeit bit_count(a).sum()
# 846 µs ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit m1(a)
# 3.81 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit m2(a)
# 49.8 ms ± 97.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

m1 and m2 taken as-is from @Ehsan's answer.

numpy vectorized way to count non-zero bits in array of integers

Tags:

python

arrays

vectorization

numpy

Alexander Karp

2 Answers

Ehsan

Mad Physicist

Recent Activity

Donate For Us

numpy vectorized way to count non-zero bits in array of integers

Tags:

python

arrays

vectorization

numpy

Alexander Karp

2 Answers

Ehsan

Mad Physicist

Related questions

Recent Activity

Donate For Us