There's got to be a faster and better way to swap bytes of 16bit words then this.:
public static void Swap(byte[] data)
{
for (int i = 0; i < data.Length; i += 2)
{
byte b = data[i];
data[i] = data[i + 1];
data[i + 1] = b;
}
}
Does anyone have an idea?
I always liked this:
public static Int64 SwapByteOrder(Int64 value)
{
var uvalue = (UInt64)value;
UInt64 swapped =
( (0x00000000000000FF) & (uvalue >> 56)
| (0x000000000000FF00) & (uvalue >> 40)
| (0x0000000000FF0000) & (uvalue >> 24)
| (0x00000000FF000000) & (uvalue >> 8)
| (0x000000FF00000000) & (uvalue << 8)
| (0x0000FF0000000000) & (uvalue << 24)
| (0x00FF000000000000) & (uvalue << 40)
| (0xFF00000000000000) & (uvalue << 56));
return (Int64)swapped;
}
I believe you'll find this is the fastest method as well a being fairly readable and safe. Obviously this applies to 64-bit values but the same technique could be used for 32- or 16-.
In my attempt to apply for the Uberhacker award, I submit the following. For my testing, I used a Source array of 8,192 bytes and called SwapX2
100,000 times:
public static unsafe void SwapX2(Byte[] source)
{
fixed (Byte* pSource = &source[0])
{
Byte* bp = pSource;
Byte* bp_stop = bp + source.Length;
while (bp < bp_stop)
{
*(UInt16*)bp = (UInt16)(*bp << 8 | *(bp + 1));
bp += 2;
}
}
}
My benchmarking indicates that this version is over 1.8 times faster than the code submitted in the original question.
This way appears to be slightly faster than the method in the original question:
private static byte[] _temp = new byte[0];
public static void Swap(byte[] data)
{
if (data.Length > _temp.Length)
{
_temp = new byte[data.Length];
}
Buffer.BlockCopy(data, 1, _temp, 0, data.Length - 1);
for (int i = 0; i < data.Length; i += 2)
{
_temp[i + 1] = data[i];
}
Buffer.BlockCopy(_temp, 0, data, 0, data.Length);
}
My benchmarking assumed that the method is called repeatedly, so that the resizing of the _temp
array isn't a factor. This method relies on the fact that half of the byte-swapping can be done with the initial Buffer.BlockCopy(...)
call (with the source position offset by 1).
Please benchmark this yourselves, in case I've completely lost my mind. In my tests, this method takes approximately 70% as long as the original method (which I modified to declare the byte b
outside of the loop).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With