Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there memset() that accepts integers larger than char?

Tags:

c

optimization

Is there a version of memset() which sets a value that is larger than 1 byte (char)? For example, let's say we have a memset32() function, so using it we can do the following:

int32_t array[10]; memset32(array, 0xDEADBEEF, sizeof(array)); 

This will set the value 0xDEADBEEF in all the elements of array. Currently it seems to me this can only be done with a loop.

Specifically, I am interested in a 64 bit version of memset(). Know anything like that?

like image 580
gnobal Avatar asked Sep 20 '08 17:09

gnobal


People also ask

Why does memset take an int?

memset predates (by quite a bit) the addition of function prototypes to C. Without a prototype, you can't pass a char to a function -- when/if you try, it'll be promoted to int when you pass it, and what the function receives is an int .

Does memset work with long long?

memset only uses one byte of the value passed in and does bytewise initialization. If you want to initialize a long long array with a particular value, just use std::fill or std::fill_n and let your library and compiler optimize it as they can (partial loop unrolling etc).

Is memset faster than for loop in C++?

Most certainly, memset will be much faster than that loop.

What is memset memcpy?

memset() is used to set all the bytes in a block of memory to a particular char value. Memset also only plays well with char as it's its initialization value. memcpy() copies bytes between memory. This type of data being copied is irrelevant, it just makes byte-for-byte copies.


2 Answers

void memset64( void * dest, uint64_t value, uintptr_t size ) {   uintptr_t i;   for( i = 0; i < (size & (~7)); i+=8 )   {     memcpy( ((char*)dest) + i, &value, 8 );   }     for( ; i < size; i++ )   {     ((char*)dest)[i] = ((char*)&value)[i&7];   }   } 

(Explanation, as requested in the comments: when you assign to a pointer, the compiler assumes that the pointer is aligned to the type's natural alignment; for uint64_t, that is 8 bytes. memcpy() makes no such assumption. On some hardware unaligned accesses are impossible, so assignment is not a suitable solution unless you know unaligned accesses work on the hardware with small or no penalty, or know that they will never occur, or both. The compiler will replace small memcpy()s and memset()s with more suitable code so it is not as horrible is it looks; but if you do know enough to guarantee assignment will always work and your profiler tells you it is faster, you can replace the memcpy with an assignment. The second for() loop is present in case the amount of memory to be filled is not a multiple of 64 bits. If you know it always will be, you can simply drop that loop.)

like image 99
moonshadow Avatar answered Sep 22 '22 19:09

moonshadow


There's no standard library function afaik. So if you're writing portable code, you're looking at a loop.

If you're writing non-portable code then check your compiler/platform documentation, but don't hold your breath because it's rare to get much help here. Maybe someone else will chip in with examples of platforms which do provide something.

The way you'd write your own depends on whether you can define in the API that the caller guarantees the dst pointer will be sufficiently aligned for 64-bit writes on your platform (or platforms if portable). On any platform that has a 64-bit integer type at all, malloc at least will return suitably-aligned pointers.

If you have to cope with non-alignment, then you need something like moonshadow's answer. The compiler may inline/unroll that memcpy with a size of 8 (and use 32- or 64-bit unaligned write ops if they exist), so the code should be pretty nippy, but my guess is it probably won't special-case the whole function for the destination being aligned. I'd love to be corrected, but fear I won't be.

So if you know that the caller will always give you a dst with sufficient alignment for your architecture, and a length which is a multiple of 8 bytes, then do a simple loop writing a uint64_t (or whatever the 64-bit int is in your compiler) and you'll probably (no promises) end up with faster code. You'll certainly have shorter code.

Whatever the case, if you do care about performance then profile it. If it's not fast enough try again with more optimisation. If it's still not fast enough, ask a question about an asm version for the CPU(s) on which it's not fast enough. memcpy/memset can get massive performance increases from per-platform optimisation.

like image 21
Steve Jessop Avatar answered Sep 18 '22 19:09

Steve Jessop