Is there memset() that accepts integers larger than char?

Tags:

optimization

Is there a version of memset() which sets a value that is larger than 1 byte (char)? For example, let's say we have a memset32() function, so using it we can do the following:

int32_t array[10]; memset32(array, 0xDEADBEEF, sizeof(array));

This will set the value 0xDEADBEEF in all the elements of array. Currently it seems to me this can only be done with a loop.

Specifically, I am interested in a 64 bit version of memset(). Know anything like that?

580

asked Sep 20 '08 17:09

2 Answers

void memset64( void * dest, uint64_t value, uintptr_t size ) {   uintptr_t i;   for( i = 0; i < (size & (~7)); i+=8 )   {     memcpy( ((char*)dest) + i, &value, 8 );   }     for( ; i < size; i++ )   {     ((char*)dest)[i] = ((char*)&value)[i&7];   }   }

(Explanation, as requested in the comments: when you assign to a pointer, the compiler assumes that the pointer is aligned to the type's natural alignment; for uint64_t, that is 8 bytes. memcpy() makes no such assumption. On some hardware unaligned accesses are impossible, so assignment is not a suitable solution unless you know unaligned accesses work on the hardware with small or no penalty, or know that they will never occur, or both. The compiler will replace small memcpy()s and memset()s with more suitable code so it is not as horrible is it looks; but if you do know enough to guarantee assignment will always work and your profiler tells you it is faster, you can replace the memcpy with an assignment. The second for() loop is present in case the amount of memory to be filled is not a multiple of 64 bits. If you know it always will be, you can simply drop that loop.)

answered Sep 22 '22 19:09

moonshadow

There's no standard library function afaik. So if you're writing portable code, you're looking at a loop.

If you're writing non-portable code then check your compiler/platform documentation, but don't hold your breath because it's rare to get much help here. Maybe someone else will chip in with examples of platforms which do provide something.

The way you'd write your own depends on whether you can define in the API that the caller guarantees the dst pointer will be sufficiently aligned for 64-bit writes on your platform (or platforms if portable). On any platform that has a 64-bit integer type at all, malloc at least will return suitably-aligned pointers.

If you have to cope with non-alignment, then you need something like moonshadow's answer. The compiler may inline/unroll that memcpy with a size of 8 (and use 32- or 64-bit unaligned write ops if they exist), so the code should be pretty nippy, but my guess is it probably won't special-case the whole function for the destination being aligned. I'd love to be corrected, but fear I won't be.

So if you know that the caller will always give you a dst with sufficient alignment for your architecture, and a length which is a multiple of 8 bytes, then do a simple loop writing a uint64_t (or whatever the 64-bit int is in your compiler) and you'll probably (no promises) end up with faster code. You'll certainly have shorter code.

Whatever the case, if you do care about performance then profile it. If it's not fast enough try again with more optimisation. If it's still not fast enough, ask a question about an asm version for the CPU(s) on which it's not fast enough. memcpy/memset can get massive performance increases from per-platform optimisation.

answered Sep 18 '22 19:09

Steve Jessop

Related questions
                            
                                How to call an external program with parameters?
                            
                                'RTLD_NEXT' undeclared
                            
                                How to kill processes by name? (Win32 API)
                            
                                What does ERESTARTSYS used while writing linux driver?
                            
                                Declaring 64-bit variables in C
                            
                                strcpy()/strncpy() crashes on structure member with extra space when optimization is turned on on Unix?
                            
                                What does "control reaches end of non-void function" mean?
                            
                                Does a LibC os exist?
                            
                                useless class storage specifier in empty declaration
                            
                                calling main() in main() in c
                            
                                Why use div or ldiv in C/C++?
                            
                                Auto release of stack variables in C
                            
                                struct alignment C/C++
                            
                                Inconsistent strcmp() return value when passing strings as pointers or as literals
                            
                                C equivalent of autoflush (flush stdout after each write)?
                            
                                How do you determine the length of an unsigned char*?
                            
                                How can I hook Windows functions in C/C++?
                            
                                Use of the : operator in C [duplicate]
                            
                                Printing long int value in C
                            
                                Reading a file located in memory with libavformat

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With