Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'memcpy'-like function that supports offsets by individual bits?

I was thinking about solving this, but it's looking to be quite a task. If I take this one by myself, I'll likely write it several different ways and pick the best, so I thought I'd ask this question to see if there's a good library that solves this already or if anyone has thoughts/advice.

void OffsetMemCpy(u8* pDest, u8* pSrc, u8 srcBitOffset, size size)
{
    // Or something along these lines. srcBitOffset is 0-7, so the pSrc buffer 
    // needs to be up to one byte longer than it would need to be in memcpy.
    // Maybe explicitly providing the end of the buffer is best.
    // Also note that pSrc has NO alignment assumptions at all.
}

My application is time critical so I want to nail this with minimal overhead. This is the source of the difficulty/complexity. In my case, the blocks are likely to be quite small, perhaps 4-12 bytes, so big-scale memcpy stuff (e.g. prefetch) isn't that important. The best result would be the one that benches fastest for constant 'size' input, between 4 and 12, for randomly unaligned src buffers.

  • Memory should be moved in word sized blocks whenever possible
  • Alignment of these word sized blocks is important. pSrc is unaligned, so we may need to read a few bytes off the front until it is aligned.

Anyone have, or know of, a similar implemented thing? Or does anyone want to take a stab at writing this, getting it to be as clean and efficient as possible?

Edit: It seems people are voting this "close" for "too broad". A few narrowing details would be AMD64 is the preferred architecture, so lets assume that. This means little endian etc. The implementation would hopefully fit well within the size of an answer so I don't think this is too broad. I'm asking for answers that are a single implementation at a time, even though there are a few approaches.

like image 704
VoidStar Avatar asked Aug 17 '15 06:08

VoidStar


People also ask

Does memcpy copy byte by byte?

Memcpy copies data bytes by byte from the source array to the destination array. This copying of data is threadsafe.

What can I use instead of memcpy?

memmove() is similar to memcpy() as it also copies data from a source to destination.

What is the use of memcpy in C++?

The memcpy() function in C++ copies specified bytes of data from the source to the destination. It is defined in the cstring header file.

Does memcpy clear memory?

memcpy() itself doesn't do any memory allocations. You delete what you new , and delete[] what you new[] . You do neither new nor new[] . Both source and destination arrays are allocated on the stack and will be automatically deallocated when then go out of scope.


1 Answers

I would start with a simple implementation such as this:

inline void OffsetMemCpy(uint8_t* pDest, const uint8_t* pSrc, const uint8_t srcBitOffset, const size_t size)
{
    if (srcBitOffset == 0)
    {
        for (size_t i = 0; i < size; ++i)
        {
            pDest[i] = pSrc[i];
        }
    }
    else if (size > 0)
    {
        uint8_t v0 = pSrc[0];
        for (size_t i = 0; i < size; ++i)
        {
            uint8_t v1 = pSrc[i + 1];
            pDest[i] = (v0 << srcBitOffset) | (v1 >> (CHAR_BIT - srcBitOffset));
            v0 = v1;            
        }
    }
}

(warning: untested code!).

Once this is working then profile it in your application - you may find it's plenty fast enough for your needs and thereby avoid the pitfalls of premature optimisation. If not then you have a useful baseline reference implementation for further optimisation work.

Be aware that for small copies the overhead of testing for alignment and word-sized copies etc may well outweigh any benefits, so a simple byte by byte loop such as the above may well be close to optimal.

Note also that optimisations may well be architecture-dependent - micro-optimisations which give a benefit on one CPU may well be counter-productive on another.

like image 158
Paul R Avatar answered Nov 15 '22 12:11

Paul R