I'm profiling some C# code. The method below is one of the most expensive ones. For the purpose of this question, assume that micro-optimization is the right thing to do. Is there an approach to improve performance of this method?
Changing the input parameter to p
to ulong[]
would create a macro inefficiency.
static ulong Fetch64(byte[] p, int ofs = 0)
{
unchecked
{
ulong result = p[0 + ofs] +
((ulong) p[1 + ofs] << 8) +
((ulong) p[2 + ofs] << 16) +
((ulong) p[3 + ofs] << 24) +
((ulong) p[4 + ofs] << 32) +
((ulong) p[5 + ofs] << 40) +
((ulong) p[6 + ofs] << 48) +
((ulong) p[7 + ofs] << 56);
return result;
}
}
Optimization is a program transformation technique, which tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed. In optimization, high-level general programming constructs are replaced by very efficient low-level programming codes.
Code optimization is any method of code modification to improve code quality and efficiency. A program may be optimized so that it becomes a smaller size, consumes less memory, executes more rapidly, or performs fewer input/output operations.
Why not use BitConverter? I've got to believe the Microsoft has spent some time tuning that code. Plus it deals with endian issues.
Here's how BitConverter turns a byte[] into a long/ulong (ulong converts it as signed and then casts it to unsigned):
[SecuritySafeCritical]
public static unsafe long ToInt64(byte[] value, int startIndex)
{
if (value == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.value);
}
if (((ulong) startIndex) >= value.Length)
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.startIndex, ExceptionResource.ArgumentOutOfRange_Index);
}
if (startIndex > (value.Length - 8))
{
ThrowHelper.ThrowArgumentException(ExceptionResource.Arg_ArrayPlusOffTooSmall);
}
fixed (byte* numRef = &(value[startIndex]))
{
if ((startIndex % 8) == 0)
{
return *(((long*) numRef));
}
if (IsLittleEndian)
{
int num = ((numRef[0] | (numRef[1] << 8)) | (numRef[2] << 0x10)) | (numRef[3] << 0x18);
int num2 = ((numRef[4] | (numRef[5] << 8)) | (numRef[6] << 0x10)) | (numRef[7] << 0x18);
return (((long) ((ulong) num)) | (num2 << 0x20));
}
int num3 = (((numRef[0] << 0x18) | (numRef[1] << 0x10)) | (numRef[2] << 8)) | numRef[3];
int num4 = (((numRef[4] << 0x18) | (numRef[5] << 0x10)) | (numRef[6] << 8)) | numRef[7];
return (((long) ((ulong) num4)) | (num3 << 0x20));
}
}
I suspect that doing the conversion one 32-bit word at a time is for 32-bit efficiency. No 64-bit registers on a 32-bit CPU means dealing with a 64-bit ints is a lot more expensive.
If you know for sure you're targeting 64-bit hardware, it might be faster to do do the conversion in one fell swoop.
Try to use for
instead of unrolling the loop. You may be able to save time on boundary checks.
Try BitConverter.ToUInt64 - http://msdn.microsoft.com/en-us/library/system.bitconverter.touint64.aspx if it is what you looking for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With