I'm working on an IEqualityComparer
which is supposed to compare arrays of primitive types really fast. My plan is to obtain pointers to the arrays and memcmp
them. Like this:
public unsafe override bool Equals(T[] x, T[] y)
{
if (ReferenceEquals(x, y)) return true;
if (x == null || y == null) return false;
if (x.Length != y.Length) return false;
var xArray = (Array)x;
var yArray = (Array)y;
fixed (void* xPtr = xArray) //compiler error 1
fixed (T* yPtr = y) //compiler error 2
{
return memcmp(xPtr, yPtr, x.Length * this.elementSize);
}
}
The fixed statement does not allow me to pin either Array
or T[]
.
There error messages are:
1. Cannot implicitly convert type 'System.Array' to 'void*'
2. Cannot take the address of, get the size of, or declare a pointer to a managed type ('T')
Now, I don't actually care how I make this work (I'm not committed to this exact approach). How can I memcmp
two T[]
where I know that T
is a primitive/blittable type?
I really want to avoid switching on the type and creating a specialized (and duplicated) code version for each interesting type. Any kind of reflection solution is not workable due to performance constraints (and yes I really need the performance here - no premature optimization warnings apply as it is customary on Stack Overflow).
where I know that T is a primitive/blittable type
You know it, the compiler doesn't know that. The CLR demands that everything in a pinned object can no longer be moved by the garbage collector. For an Array, that includes its array elements. Only kind of T that qualifies is a simple value type, one that is blittable. Generics does not give you a way to constrain T to a blittable type.
You'd normally declare the arguments of memcmp() as byte[]. The pinvoke marshaller then already does the right thing and will pin the byte[] arrays before calling memcmp(). This however will not work either since you cannot easily convert a T[] to a byte[] either. You'll have to pin yourself with GCHandle. Declare the memcmp() arguments as IntPtr instead of byte[] accordingly.
The subset of types which can work is in practice small enough to consider simply writing method overloads instead of a generic method. Which now enables the pinvoke marshaller to take care of the pinning, overload the memcmp() function declarations accordingly.
You can write a separate P/Invoke front-end for each kind of array that you want to compare. I know you don't really want to specialise on T, but the overhead isn't too great I think.
This is a bit of a hack, because I'm defining multiple P/Invoke methods with different signatures for the same API function, but by doing this I can leverage the P/Invoke marshalling support.
(Note that the sign of the return value from memcmp is really only meaningful if the source data is indeed an array of bytes. If you are giving it an array of ints, you should only compare the return value with zero and ignore its sign. The ordering that it implies is not meaningful for ints.)
For example, the following code prints the following for me (RELEASE build, not debug build):
MemCmp with ints took 00:00:08.0768666
ManagedMemCmp with ints took 00:00:10.3750453
MemCmp with bytes took 00:00:01.8740001
ManagedMemCmp with bytes took 00:00:09.2885763
Note that the byte[] test is using bytes so is comparing a quarter of the number of bytes than the int[] test uses. The managed code makes the same number of comparisons, so it is comparatively a lot slower.
There isn't really a huge difference between the times for comparing arrays of ints, but there is a large difference for comparing arrays of bytes. This suggests to me that there could be an managed optimisation that uses fixed to get a pointer to ints from a byte array in order to compare 4 bytes at a time (with some fiddling for the possibly extra bytes at the end that don't fit into an int).
I also think you could write a multithreaded managed version (using the "int*" optimisation to compare byte arrays) which would be MUCH FASTER than the unmanaged memcmp(), which is of course not multithreaded (as far as I know).
Anyway, here's my test code. Remember, RELEASE build, not debug!
using System;
using System.Diagnostics;
using System.Runtime.InteropServices;
namespace Demo
{
public static class Program
{
private static void Main(string[] args)
{
int[] a1 = new int[1000000];
int[] a2 = new int[1000000];
for (int i = 0; i < a1.Length-1; ++i)
{
a1[i] = i;
a2[i] = i;
}
a1[a1.Length-1] = 1;
a2[a1.Length-1] = 2;
byte[] b1 = new byte[1000000];
byte[] b2 = new byte[1000000];
for (int i = 0; i < b1.Length-1; ++i)
{
b1[i] = (byte)i;
b2[i] = (byte)i;
}
b1[a1.Length-1] = 1;
b2[a1.Length-1] = 2;
Stopwatch sw = Stopwatch.StartNew();
testWithMemCmp(a1, a2);
sw.Stop();
Console.WriteLine("MemCmp with ints took " + sw.Elapsed);
sw.Restart();
testWithManagedMemCmp(a1, a2);
sw.Stop();
Console.WriteLine("ManagedMemCmp with ints took " + sw.Elapsed);
sw.Restart();
testWithMemCmp(b1, b2);
sw.Stop();
Console.WriteLine("MemCmp with bytes took " + sw.Elapsed);
sw.Restart();
testWithManagedMemCmp(b1, b2);
sw.Stop();
Console.WriteLine("ManagedMemCmp with bytes took " + sw.Elapsed);
}
private static void testWithMemCmp(int[] a1, int[] a2)
{
for (int j = 0; j < COUNT; ++j)
{
MemCmp(a1, a2);
}
}
private static void testWithMemCmp(byte[] a1, byte[] a2)
{
for (int j = 0; j < COUNT; ++j)
{
MemCmp(a1, a2);
}
}
private static void testWithManagedMemCmp(int[] a1, int[] a2)
{
for (int j = 0; j < COUNT; ++j)
{
ManagedMemCmp(a1, a2);
}
}
private static void testWithManagedMemCmp(byte[] a1, byte[] a2)
{
for (int j = 0; j < COUNT; ++j)
{
ManagedMemCmp(a1, a2);
}
}
public static bool ManagedMemCmp(int[] a1, int[] a2)
{
if (a1 == null || a2 == null || a1.Length != a2.Length)
{
throw new InvalidOperationException("Arrays are null or different lengths.");
}
for (int i = 0; i < a1.Length; ++i)
{
if (a1[i] != a2[i])
{
return false;
}
}
return true;
}
public static bool ManagedMemCmp(byte[] a1, byte[] a2)
{
if (a1 == null || a2 == null || a1.Length != a2.Length)
{
throw new InvalidOperationException("Arrays are null or different lengths.");
}
for (int i = 0; i < a1.Length; ++i)
{
if (a1[i] != a2[i])
{
return false;
}
}
return true;
}
public static bool MemCmp(byte[] a1, byte[] a2)
{
if (a1 == null || a2 == null || a1.Length != a2.Length)
{
throw new InvalidOperationException("Arrays are null or different lengths.");
}
return memcmp(a1, a2, new UIntPtr((uint)a1.Length)) == 0;
}
public static bool MemCmp(int[] a1, int[] a2)
{
if (a1 == null || a2 == null || a1.Length != a2.Length)
{
throw new InvalidOperationException("Arrays are null or different lengths.");
}
return memcmp(a1, a2, new UIntPtr((uint)(a1.Length * sizeof(int)))) == 0;
}
[DllImport("msvcrt.dll")]
private static extern int memcmp(byte[] a1, byte[] a2, UIntPtr count);
[DllImport("msvcrt.dll")]
private static extern int memcmp(int[] a1, int[] a2, UIntPtr count);
private const int COUNT = 10000;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With