Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary mask with shift operation without cycle

We have some large binary number N (large means millions of digits). We also have binary mask M where 1 means that we must remove digit in this position in number N and move all higher bits one position right.

Example:

N = 100011101110
M = 000010001000
Res   1000110110

Is it possible to solve this problem without cycle with some set of logical or arithmetical operations? We can assume that we have access to bignum arithmetic in Python.

Feels like it should be something like this: Res = N - (N xor M) But it doesn't work

UPD: My current solution with cycle is following:

def prepare_reduced_arrays(dict_of_N, mask):
    '''
    mask: string '0000011000'
    each element of dict_of_N - big python integer
    '''

    capacity = len(mask)
    answer = dict()
    for el in dict_of_N:
        answer[el] = 0

    new_capacity = 0
    for i in range(capacity - 1, -1, -1):
        if mask[i] == '1':
            continue
        cap2 = (1 << new_capacity)
        pos = (capacity - i - 1)
        for el in dict_of_N:
            current_bit = (dict_of_N[el] >> pos) & 1
            if current_bit:
                answer[el] |= cap2
        new_capacity += 1

    return answer, new_capacity
like image 973
ZFTurbo Avatar asked May 22 '17 16:05

ZFTurbo


People also ask

How to bit shift left?

Left Shifts When shifting left, the most-significant bit is lost, and a 0 bit is inserted on the other end. The left shift operator is usually written as "<<".

What is a mask in binary?

A mask is a binary image consisting of zero- and non-zero values. If a mask is applied to another binary or to a grayscale image of the same size, all pixels which are zero in the mask are set to zero in the output image. All others remain unchanged.

What is Bitmask in Python?

Overview. A bit is a boolean value that can be either 0 or 1. Bitmasking is the act of applying a mask over a value to keep, change or modify a piece of given information. A mask determines which bits to take and which bits to clear off a binary number.


1 Answers

While this may not be possible without a loop in python, it can be made extremely fast with numba and just in time compilation. I went on the assumption that your inputs could be easily represented as boolean arrays, which would be very simple to construct from a binary file using struct. The method I have implemented involves iterating a few different objects, however these iterations were chosen carefully to make sure they were compiler optimized, and never doing the same work twice. The first iteration is using np.where to locate the indices of all the bits to delete. This specific function (among many others) is optimized by the numba compiler. I then use this list of bit indices to build the slice indices for slices of bits to keep. The final loop copies these slices to an empty output array.

import numpy as np
from numba import jit
from time import time

def binary_mask(num, mask):
    num_nbits = num.shape[0] #how many bits are in our big num
    mask_bits = np.where(mask)[0] #which bits are we deleting
    mask_n_bits = mask_bits.shape[0] #how many bits are we deleting
    start = np.empty(mask_n_bits + 1, dtype=int) #preallocate array for slice start indexes
    start[0] = 0 #first slice starts at 0
    start[1:] = mask_bits + 1 #subsequent slices start 1 after each True bit in mask
    end = np.empty(mask_n_bits + 1, dtype=int) #preallocate array for slice end indexes
    end[:mask_n_bits] = mask_bits #each slice ends on (but does not include) True bits in the mask
    end[mask_n_bits] = num_nbits + 1 #last slice goes all the way to the end
    out = np.empty(num_nbits - mask_n_bits, dtype=np.uint8) #preallocate return array
    for i in range(mask_n_bits + 1): #for each slice 
        a = start[i] #use local variables to reduce number of lookups
        b = end[i]
        c = a - i
        d = b - i
        out[c:d] = num[a:b] #copy slices
    return out

jit_binary_mask = jit("b1[:](b1[:], b1[:])")(binary_mask) #decorator without syntax sugar

###################### Benchmark ########################

bignum = np.random.randint(0,2,1000000, dtype=bool) # 1 million random bits
bigmask = np.random.randint(0,10,1000000, dtype=np.uint8)==9 #delete about 1 in 10 bits

t = time()
for _ in range(10): #10 cycles of just numpy implementation
    out = binary_mask(bignum, bigmask)
print(f"non-jit: {time()-t} seconds")

t = time()
out = jit_binary_mask(bignum, bigmask) #once ahead of time to compile
compile_and_run = time() - t

t = time()
for _ in range(10): #10 cycles of compiled numpy implementation
    out = jit_binary_mask(bignum, bigmask)
jit_runtime = time()-t
print(f"jit: {jit_runtime} seconds")

print(f"estimated compile_time: {compile_and_run - jit_runtime/10}")

In this example, I execute the benchmark on a boolean array of length 1,000,000 a total of 10 times for both the compiled and un-compiled version. On my laptop, the output is:

non-jit: 1.865583896636963 seconds
jit: 0.06370806694030762 seconds
estimated compile_time: 0.1652850866317749

As you can see with a simple algorithm like this, very significant performance gains can be seen from compilation. (in my case about 20-30x speedup)

like image 56
Aaron Avatar answered Sep 29 '22 03:09

Aaron