Is there a function (SSEx intrinsics is OK) which will fill the memory with a specified int32_t
value? For instance, when this value is equal to 0xAABBCC00
the result memory should look like:
AABBCC00AABBCC00AABBCC00AABBCC00AABBCC00
AABBCC00AABBCC00AABBCC00AABBCC00AABBCC00
AABBCC00AABBCC00AABBCC00AABBCC00AABBCC00
AABBCC00AABBCC00AABBCC00AABBCC00AABBCC00
...
I could use std::fill
or simple for-loop, but it is not fast enough.
Resizing of a vector performed only once in the beginning of program, this is not an issue. The bottleneck is filling the memory.
Simplified code:
struct X
{
typedef std::vector<int32_t> int_vec_t;
int_vec_t buffer;
X() : buffer( 5000000 ) { /* some more action */ }
~X() { /* some code here */ }
// the following function is called 25 times per second
const int_vec_t& process( int32_t background, const SOME_DATA& data );
};
const X::int_vec_t& X::process( int32_t background, const SOME_DATA& data )
{
// the following one string takes 30% of total time of #process function
std::fill( buffer.begin(), buffer.end(), background );
// some processing
// ...
return buffer;
}
1) int32_t provides exact 32 bit integer. This is important because you can port your applications to different platforms without rewriting algorithm (if they will compile and yes, int is not always 16 or 32 or 64 bit wide, check C Reference). Check nice self-explanatory page about stdint.h types
To help address the above downsides, C++ also defines two alternative sets of integers that are guaranteed to be defined. The fast types (std::int_fast#_t and std::uint_fast#_t) provide the fastest signed/unsigned integer type with a width of at least # bits (where # = 8, 16, 32, or 64).
If you assume an int is 4 bytes because that’s most likely, then your program will probably misbehave on architectures where int is actually 2 bytes (since you will probably be storing values that require 4 bytes in a 2 byte variable, which will cause overflow or undefined behavior).
For consistency, it’s best to avoid std::int8_t and std::uint8_t (and the related fast and least types) altogether (use std::int16_t or std::uint16_t instead). The 8-bit fixed-width integer types are often treated like chars instead of integer values (and this may vary per system). Prefer the 16-bit fixed integral types for most cases.
This is how I would do it (please excuse the Microsoft-ness of it):
VOID FillInt32(__out PLONG M, __in LONG Fill, __in ULONG Count)
{
__m128i f;
// Fix mis-alignment.
if ((ULONG_PTR)M & 0xf)
{
switch ((ULONG_PTR)M & 0xf)
{
case 0x4: if (Count >= 1) { *M++ = Fill; Count--; }
case 0x8: if (Count >= 1) { *M++ = Fill; Count--; }
case 0xc: if (Count >= 1) { *M++ = Fill; Count--; }
}
}
f.m128i_i32[0] = Fill;
f.m128i_i32[1] = Fill;
f.m128i_i32[2] = Fill;
f.m128i_i32[3] = Fill;
while (Count >= 4)
{
_mm_store_si128((__m128i *)M, f);
M += 4;
Count -= 4;
}
// Fill remaining LONGs.
switch (Count & 0x3)
{
case 0x3: *M++ = Fill;
case 0x2: *M++ = Fill;
case 0x1: *M++ = Fill;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With