Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bitwise memmove

What is the best way to implement a bitwise memmove? The method should take an additional destination and source bit-offset and the count should be in bits too.

  • I saw that ARM provides a non-standard _membitmove, which does exactly what I need, but I couldn't find its source.
  • Bind's bitset includes isc_bitstring_copy, but it's not efficient
  • I'm aware that the C standard library doesn't provide such a method, but I also couldn't find any third-party code providing a similar method.
like image 520
turbolent Avatar asked Sep 14 '13 16:09

turbolent


People also ask

Is Memmove safer than memcpy?

memmove is safer. memcpy can be faster, and usually is. There are less restrictions on it's implementation, so more can be done to optimize it. But not necessarily a lot more — in fact, it could even be slower then memmove , and sometimes this is the case.

Is Memmove faster than memcpy?

"memcpy is more efficient than memmove." In your case, you most probably are not doing the exact same thing while you run the two functions. In general, USE memmove only if you have to. USE it when there is a very reasonable chance that the source and destination regions are over-lapping.

Does Memmove free memory?

memmove doesn't zero the original memory block though. If you want to do that, you'll have to explicitly do it yourself with memset. As a rule, C routines don't waste cycles doing things that may not be necessary, such as zeroing memory. Compare with malloc , which likewise does not zero the memory block.

How memcpy () and Memmove () do compare in terms of performance?

That memmove might be slower than memcpy is because it is able to handle overlapping memory, but memmove still only copies the data once. profile it on the platform you're interested in the timings for. However, the chances of you writing a better memmove than memmove seems unlikely.


1 Answers

Assuming "best" means "easiest", you can copy bits one by one. Conceptually, an address of a bit is an object (struct) that has a pointer to a byte in memory and an index of a bit in the byte.

struct pointer_to_bit
{
    uint8_t* p;
    int b;
};

void membitmovebl(
    void *dest,
    const void *src,
    int dest_offset,
    int src_offset,
    size_t nbits)
{
    // Create pointers to bits
    struct pointer_to_bit d = {dest, dest_offset};
    struct pointer_to_bit s = {src, src_offset};

    // Bring the bit offsets to range (0...7)
    d.p += d.b / 8; // replace division by right-shift if bit offset can be negative 
    d.b %= 8; // replace "%=8" by "&=7" if bit offset can be negative
    s.p += s.b / 8;
    s.b %= 8;

    // Determine whether it's OK to loop forward
    if (d.p < s.p || d.p == s.p && d.b <= s.b)
    {
        // Copy bits one by one
        for (size_t i = 0; i < nbits; i++)
        {
            // Read 1 bit
            int bit = (*s.p >> s.b) & 1;

            // Write 1 bit
            *d.p &= ~(1 << d.b);
            *d.p |= bit << d.b;

            // Advance pointers
            if (++s.b == 8)
            {
                s.b = 0;
                ++s.p;
            }
            if (++d.b == 8)
            {
                d.b = 0;
                ++d.p;
            }
        }
    }
    else
    {
        // Copy stuff backwards - essentially the same code but ++ replaced by --
    }
}

If you want to write a version optimized for speed, you will have to do copying by bytes (or, better, words), unroll loops, and handle a number of special cases (memmove does that; you will have to do more because your function is more complicated).

P.S. Oh, seeing that you call isc_bitstring_copy inefficient, you probably want the speed optimization. You can use the following idea:

Start copying bits individually until the destination is byte-aligned (d.b == 0). Then, it is easy to copy 8 bits at once, doing some bit twiddling. Do this until there are less than 8 bits left to copy; then continue copying bits one by one.

// Copy 8 bits from s to d and advance pointers
*d.p = *s.p++ >> s.b;
*d.p++ |= *s.p << (8 - s.b);

P.P.S Oh, and seeing your comment on what you are going to use the code for, you don't really need to implement all the versions (byte/halfword/word, big/little-endian); you only want the easiest one - the one working with words (uint32_t).

like image 144
anatolyg Avatar answered Dec 21 '22 20:12

anatolyg