Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient execution of boolean expression on bitmap in C or C++

What is the most efficient way of executing a boolean expression on a bitmap in C or C++? For example, let's say that I have a 4-bit bitmap (a, b, c, d). Now, let's say that I have a simple boolean expression like (a AND b) OR (c AND d). How should I represent the boolean expression so that I could efficiently apply it to my bitmap? I am looking for a generic solution that can be applied to any boolean expression, not just the one given as example. In other words, I am looking for some way to "compile" the boolean expression into another data structure that could be used to efficiently reduce my bitmap into a boolean.

The bitmap structure is the result of filtering operations on the records of a database. Every record has its own bitmap, and every bit in a bitmap is the result of an individual filtering rule. The boolean expression is used to combine these filtering rules to decide whether the record should be included in the results of a database query. There can be up to 64 individual filtering rules that can be combined by the boolean operation, hence the bitmap can be represented as an unsigned long long int if necessary.

The solution should be efficient in terms of speed and should not alter the bitmap structure. The conversion of the boolean expression into another structure does not have to be memory-efficient nor fast, because it can be cached (at least in my current use case). The reducing of the bitmap with the transformed boolean expression should be both fast and memory efficient.

Notes:

  • The boolean expression is only using nested AND and OR operations (no IF statements).
  • The solution should assume the availability of a 64bit CPU.
  • The solution should not be CPU-dependent (beside 64bit addressing).
  • The solution should not assume the availability of any other particular hardware (eg GPU).
  • All the bitmaps are in memory.
  • There can be a very large number of bitmaps (billions).
  • Bitmaps are updated one at a time.
like image 772
Ismael Ghalimi Avatar asked Oct 20 '22 00:10

Ismael Ghalimi


2 Answers

The most efficient method of using AND or OR operations on bitmaps is to use hardware assistance. Many graphics processors can perform operations on two bitmaps. There is no C++ standard library operation for this.

You need to perform the operation on each bit, byte, word or double word in the bitmaps.

The next speed efficient method is to unroll the loop. Branch instructions waste execution cycles (that could be used for data instructions) and may clear out the instruction pipeline wasting time reloading it.

You can also gain some efficiency by effectively using the processor's data cache. Load up a bunch of variables, perform the operation, store the result, repeat.

You should also fetch in groups using the processor's word size. A 32-bit processor loves to fetch 32-bits at a time. So this would give you 8 sets of 4-bit pixels that are loaded with one fetch. Otherwise, you would have to fetch 8 bits at a time, which results in 4 fetches of 8 bits as compared to 1 fetch of 32-bits.

Here's the core algorithm:

uint8_t * p_bitmap_a = &Bitmap_A[0];
uint8_t * p_bitmap_b = &Bitmap_B[0];
uint8_t * p_bitmap_c = &Bitmap_C[0];

// C = A AND B

for (unsigned int i = 0; i < bitmap_size / 4; ++i)
{
  uint32_t  a = *((uint32_t*) p_bitmap_a);
  uinte2_t  b = *((uint32_t*) p_bitmap_b);
  uint32_t  c = a & b;
  *((uint32_t *) p_bitmap_c) = c;
  p_bitmap_a += sizeof(uint32_t);
  p_bitmap_b += sizeof(uint32_t);
  p_bitmap_c += sizeof(uint32_t);
}

Edit 1:
Your processor may have instructions that can assist with the operations. For example, the ARM7 processor can load many registers from memory with one instruction. Research your processors instruction set. You may have to use inline assembly language to take advantage of processor specific instructions.

Edit 2: Threading & Parallel processing.

Unless your bitmaps are huge, the overhead of maintaining multiple threads of execution or parallel execution may outweigh the benefit. For example, if the overhead of synchronizing with another CPU core is 200ms and processing of the bitmap without interruption is 1000ms, you've wasted time by using parallel processing on the single bitmap (1200 ms to have another core process the bitmap).

If you have many bitmaps, you may gain some time by using parallel processing or multiple threads:

  1. One thread fetches bitmaps from the database into memory (buffers).
  2. Another thread processes the bitmaps and stores into an outgoing buffer.
  3. A third process writes the buffered bitmaps to the database.

If you are fetching bitmaps from an external source, such as a database, this I/O will be your bottleneck. This is the part you should optimize or spool.

like image 174
Thomas Matthews Avatar answered Oct 30 '22 22:10

Thomas Matthews


If the bitmaps are GUARANTEED to always be 4 bits, then they will fit in the lower 4 bits of a char, and there will only be 16 possible values for any bitmap.

For a particular boolean expression, you then evaluate it for each of the sixteen possible bit combinations, which gives you a set of sixteen result bits. Assemble them into a sixteen bit int: false, false, false, false in bit zero, false, false, false, true in bit 1, and so on.

Now for an arbitrary bitmap versus an arbitrary boolean, your check becomes:

  1. Treat the bitmap as a 4 bit int, evaluate 1 << (4 bit int).
  2. Take the result of that shift and use the C++ & operator to test against the boolean operation's cached 16 bit int value.

That'll return == 0 for false and != 0 for true.

Reducing it to two instructions: a shift and an and is about the fastest I can see doing it.

This assumes that there are a fairly small number of boolean operation that you are applying over an over, the setup per boolean test will be expensive, but since you're talking billions of bitmaps, I'm assuming that you'll be using the same boolean operation on many many bitmaps.

like image 40
dgnuff Avatar answered Oct 30 '22 21:10

dgnuff