Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bit counting in a contiguous memory chunk

I was asked in an interview the following question.

int countSetBits(void *ptr, int start, int end); 

Synopsis: Assume that ptr points to a big chunk of memory. Viewing this memory as contiguous sequence of bits, start and end are bit positions. Assume start and end have proper values and ptr is pointing to an initialized chunck of memory.

Question: Write a C code to count number of bits set from start to end [inclusive] and return the count.

Just to make it more clear

 ptr---->+-------------------------------+
         | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
         +-------------------------------+
         | 8 | 9 |                   |15 |
         +-------------------------------+
         |                               |
         +-------------------------------+
              ...
              ...
         +-------------------------------+
         |               | S |           |
         +-------------------------------+
              ...
              ...
         +-------------------------------+
         |    | E |                      |
         +-------------------------------+
              ...
              ...

My solution:

int countSetBits(void *ptr, int start, int end )
{
    int count = 0, idx; 

    char *ch; 

    for (idx = start; idx <= end; idx++) 
    {     ch = ptr + (idx/8); 

          if((128 >> (idx%8)) & (*ch)) 
          {
                   count++; 
          }
    }

    return count; 
}

I gave a very lengthy and somewhat inefficient code during the interview. I worked on it later and came up with above solution.

I am very sure SO community can provide more elegant solution. I am just curious to see their response.

PS: Above code is not compiled. It is more like a pseudo code and may contain errors.

like image 423
Bhaskar Avatar asked Aug 27 '11 07:08

Bhaskar


People also ask

What are contiguous bits?

A “byte” is a contiguous sequence of a fixed number of bits that is used as a unit of memory, storage, and instructions execution in computers.

What is contiguous memory C++?

First of all contiguous memory means a chunk of memory allocated without any gaps in the addresses it occupies. It will be one single "block" of memory. Contiguous memory in C++ would mean various ways of allocating contiguous memory in C++.


Video Answer


4 Answers

The most quick and efficient way to my opinion is to use a table of 256 entries, where every element represents number of bits in the index. Index is a next byte from the memory location.

something like this:

int bit_table[256] = {0, 1, 1, 2, 1, ...};
char* p = ptr + start;
int count = 0;
for (p; p != ptr + end; p++)
    count += bit_table[*(unsigned char*)p];
like image 177
dimitri Avatar answered Oct 13 '22 08:10

dimitri


Boundary conditions, they get no respect...

Everyone here seems to be concentrating on the lookup table to count the bits. And that's OK, but I think that even more important when answering an interview question is to make sure you handle the boundary conditions.

The look up table is just an optimization. It's much more important to get the answer right than to get it fast. If this were my interview, going straight for the lookup table without even mentioning that there are some tricky details about handling the first few and last few bits that aren't on full-byte boundaries would be worse than coming up with a solution that counted each bit ploddingly, but got the boundary conditions right.

So I think Bhaskar's solution in his question is probably superior to the most of the answers mentioned here - it seems to handle the boundary conditions.

Here's a solution that uses a lookup table and tries to still handle the boundaries (it's only lightly tested, so I won't claim that it's 100% correct). It's also uglier than I'd like, but it's late:

typedef unsigned char uint8_t;

static
size_t bits_in_byte( uint8_t val)
{
    static int const half_byte[] = { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4 };

    int result1 = half_byte[val & 0x0f];
    int result2 = half_byte[(val >> 4) & 0x0f];

    return result1 + result2;
}


int countSetBits( void* ptr, int start, int end) 
{
    uint8_t*    first;
    uint8_t*    last;
    int         bits_first;
    int         bits_last;
    uint8_t     mask_first;
    uint8_t     mask_last;

    size_t count = 0;

    // get bits from the first byte
    first = ((uint8_t*) ptr) + (start / 8);
    bits_first = 8 - start % 8;
    mask_first = (1 << bits_first) - 1;
    mask_first = mask_first << (8 - bits_first);


    // get bits from last byte
    last = ((uint8_t*) ptr) + (end / 8);
    bits_last = 1 + (end % 8);
    mask_last = (1 << bits_last) - 1;

    if (first == last) {
        // we only have a range of bits in  the first byte
        count = bits_in_byte( (*first) & mask_first & mask_last);        
    }
    else {
        // handle the bits from the first and last bytes specially
        count += bits_in_byte((*first) & mask_first);
        count += bits_in_byte((*last) & mask_last);

        // now we've collected the odds and ends from the start and end of the bit range
        // handle the full bytes in the interior of the range

        for (first = first+1; first != last; ++first) {
            count += bits_in_byte(*first);
        }
    }

    return count;
}

Note that a detail that would have to be worked out as part of the interview is whether the bits within a byte are indexed starting at the least-significant-bit (lsb) or most-significant-bit (msb). In other words, if the start index were specified as 0, would a byte with the value 0x01 or a byte with the value 0x80 have the bit set in that index? Sort of like deciding whether the indexes consider the bit order within a byte as big-endian or little-endian.

There's no 'right' answer for this - the interviewer would have to specify what the behavior should be. I'll also note that my example solution handles this in the opposite way to the OP's example code (I was going by how I interpreted the diagram, with the indexes reading as 'bit numbers' as well). The OPs' solution considers the bit order as big-endian, my function treats them as little-endian. So even though both handle partial bytes at the star & end of the range, they'll give different answers. Which is the right answer depends on what the actual spec for the problem is.

like image 40
Michael Burr Avatar answered Oct 13 '22 09:10

Michael Burr


The version of @dimitri is likely the fastest. But it is difficult to build the table of bit counts for all 128 8-bit chars in an interview. You can get a very fast version with a table for 16 hex numbers 0x0, 0x1, ..., 0xF, that you can build easily:

int countBits(void *ptr, int start, int end) {
    // start, end are byte indexes
    int hexCounts[16] =   {0, 1, 1, 2,   1, 2, 2, 3,
                           1, 2, 3, 3,   2, 3, 3, 4}; 
    unsigned char * pstart = (unsigned char *) ptr + start;
    unsigned char * pend = (unsigned char *) ptr + end;
    int count = 0;
    for (unsigned char * p = pstart; p <= pend; ++p) {
        unsigned char b = *p;
        count += hexCounts[b & 0x0F] + hexCounts[(b >> 4) & 0x0F];
    }
    return count;
}

EDIT: If start and end are bit indexes then the bits in the first and last bytes would be counted first before the above function is called:

int countBits2(void *ptr, int start, int end) {
    // start, end are bit indexes
    if (start > end) return 0;
    int count = 0;
    unsigned char* pstart = (unsigned char *) ptr + start/8; // first byte
    unsigned char* pend = (unsigned char *) ptr + end/8;     // last byte
    int istart = start % 8;                                  // index in first byte
    int iend = end % 8;                                      // index in last byte 
    unsigned char b = *pstart;                               // byte
    if (pstart == pend) {                                    // count in 1 byte only
        b = b << istart;
        for (int i = istart; i <= iend; ++i) {               // between istart, iend
            if (b & 0x80) ++count; 
            b = b << 1;
        }
    }
    else {                                                   // count in 2 bytes
        for (int i = istart; i < 8; ++i) {                   // from istart to 7
            if (b & 1) ++count; 
            b = b >> 1;
        }
        b = *pend;
        for (int i = 0; i <= iend; ++i) {                    // from 0 to iend
            if (b & 0x80) ++count; 
            b = b << 1;
        }
    }
    return count + countBits(ptr, start/8 + 1, end/8 - 1);
}
like image 39
Jiri Kriz Avatar answered Oct 13 '22 09:10

Jiri Kriz


An excellent recent study comparing several of the most modern techniques for counting the number of 'set' (1-valued) bits in a range of memory (aka Hamming Weight, bitset cardinality, sideways sum, population count or popcnt, etc.) can be found in Wojciech, Kurz, and Lemire (2017), Faster population counts using AVX2 instructions1

The following is a complete, tested, and fully-working C# adaptation of the "Harley-Seal" algorithm from that paper, which the authors found to be the fastest method that uses general-purpose bitwise operations (that is, that doesn't require special hardware).

1. Managed array entry points
(optional) Provides access to the block-optimized bit-counting for managed array ulong[].

/// <summary> Returns the total number of 1-valued bits in the array </summary>
[DebuggerStepThrough]
public static int OnesCount(ulong[] rg) => OnesCount(rg, 0, rg.Length);

/// <summary> Finds the total number of '1' bits in an array or its subset </summary>
/// <param name="rg"> Array of ulong values to scan </param>
/// <param name="index"> Starting index in the array </param>
/// <param name="count"> Number of ulong values to examine, starting at 'i' </param>
public static int OnesCount(ulong[] rg, int index, int count)
{
    if ((index | count) < 0 || index > rg.Length - count)
        throw new ArgumentException();

    fixed (ulong* p = &rg[index])
        return OnesCount(p, count);
}

2. Scalar API
Used by the block-optimized counter to aggregate results from the carry-save adder, and also to finish up any remainder for block sizes not divisible by the optimized chunk size of 16 x 8 bytes/ulong = 128 bytes. Suitable for general-purpose use also.

/// <summary> Finds the Hamming Weight or ones-count of a ulong value </summary>
/// <returns> The number of 1-bits that are set in 'x' </returns>
public static int OnesCount(ulong x)
{
    x -= (x >> 1) & 0x5555555555555555;
    x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
    return (int)((((x + (x >> 4)) & 0x0F0F0F0F0F0F0F0F) * 0x0101010101010101) >> 56);
}

3. "Harley-Seal" block-optimized 1s-bit counter
Processes blocks of 128 bytes at a time, i.e., 16 ulong values per block. Uses the carry-save adder (shown below) to gang-add single bits across adjacent ulongs, and aggregates totals upwards as powers of two.

/// <summary> Count the number of 'set' (1-valued) bits in a range of memory. </summary>
/// <param name="p"> Pointer to an array of 64-bit ulong values to scan </param>
/// <param name="c"> Size of the memory block as a count of 64-bit ulongs </param>
/// <returns> The total number of 1-bits </returns>
public static int OnesCount(ulong* p, int c)
{
    ulong z, y, x, w;
    int c = 0;

    for (w = x = y = z = 0UL; cq >= 16; cq -= 16)
        c += OnesCount(CSA(ref w,
                            CSA(ref x,
                                CSA(ref y,
                                    CSA(ref z, *p++, *p++),
                                    CSA(ref z, *p++, *p++)),
                                CSA(ref y,
                                    CSA(ref z, *p++, *p++),
                                    CSA(ref z, *p++, *p++))),
                            CSA(ref x,
                                CSA(ref y,
                                    CSA(ref z, *p++, *p++),
                                    CSA(ref z, *p++, *p++)),
                                CSA(ref y,
                                    CSA(ref z, *p++, *p++),
                                    CSA(ref z, *p++, *p++)))));

    c <<= 4;
    c += (OnesCount(w) << 3) + (OnesCount(x) << 2) + (OnesCount(y) << 1) + OnesCount(z);

    while (--cq >= 0)
        c += OnesCount(*p++);

    return c;
}

4. Carry-save adder (CSA)

/// <summary> carry-save adder </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static ulong CSA(ref ulong a, ulong b, ulong c)
{
    ulong v = a & b | (a ^ b) & c;
    a ^= b ^ c;
    return v;
}


Remarks

Because the approach shown here counts the total number of 1-bits by proceeding 128-byte chunks at a time, it only becomes optimal with larger memory block sizes. For example, likely at least some (small) multiple of that sixteen-qword (16-ulong) chunk size. For counting 1-bits in smaller memory ranges, this code will work correctly, but drastically underperform more naïve methods. See the paper for details.

From the paper, this diagram summarizes how the Carry-Save Adder works:

Carry-Save Adder in 'Harley-Seal' block-optimized bit count


References

[1.] Muła, Wojciech, Nathan Kurz, and Daniel Lemire. "Faster population counts using AVX2 instructions." The Computer Journal 61, no. 1 (2017): 111-120.

like image 31
Glenn Slayden Avatar answered Oct 13 '22 08:10

Glenn Slayden