Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# byte[] comparison without bound checks

I am looking for performance efficient ways to compare two byte[] for equality. Sizes are above 1 MB, so the overhead for each array element should be minimized.

I aim to beat the speeds of SequenceEqual or a hand-coded for-loop over every item, by avoiding the repetitive bound checks for both arrays. In the same way that Array.Copy could lead to fast memcpy, what will lead to a memcmp?

like image 585
shojtsy Avatar asked Jan 31 '10 21:01

shojtsy


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is C full form?

Full form of C is “COMPILE”.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

What is C language basics?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.


3 Answers

You can use unsafe code to do pointer operations. You can compare the bytes four at a time as integers:

public static bool ArrayCompare(byte[] a, byte[] b) {
  if (a.Length != b.Length) return false;
  int len = a.Length;
  unsafe {
    fixed(byte* ap = a, bp = b) {
      int* aip = (int*)ap, bip = (int*)bp;
      for (;len >= 4;len-=4) {
        if (*aip != *bip) return false;
        aip++;
        bip++;
      }
      byte* ap2 = (byte*)aip, bp2 = (byte*)bip;
      for (;len>0;len--) {
        if (*ap2 != *bp2) return false;
        ap2++;
        bp2++;
      }
    }
  }
  return true;
}

A tested this against a simple loop, and it's about six times faster.

As suggested by Josh Einstein, long could be used on a 64 bit system. Actually it seems to be almost twice as fast both on 32 and 64 bit systems:

public static bool ArrayCompare64(byte[] a, byte[] b) {
  if (a.Length != b.Length) return false;
  int len = a.Length;
  unsafe {
    fixed (byte* ap = a, bp = b) {
      long* alp = (long*)ap, blp = (long*)bp;
      for (; len >= 8; len -= 8) {
        if (*alp != *blp) return false;
        alp++;
        blp++;
      }
      byte* ap2 = (byte*)alp, bp2 = (byte*)blp;
      for (; len > 0; len--) {
        if (*ap2 != *bp2) return false;
        ap2++;
        bp2++;
      }
    }
  }
  return true;
}
like image 148
Guffa Avatar answered Oct 20 '22 10:10

Guffa


If performance really matters then the fastest way to do it is by using the CRT library included with every version of Windows. This code takes ~51 msec on my poky laptop, works on 64-bit machines too:

using System;
using System.Runtime.InteropServices;
using System.Diagnostics;

class Program {
  static void Main(string[] args) {
    byte[] arr1 = new byte[50 * 1024 * 1024];
    byte[] arr2 = new byte[50 * 1024 * 1024];
    var sw = Stopwatch.StartNew();
    bool equal = memcmp(arr1, arr2, arr1.Length) == 0;
    sw.Stop();
    Console.WriteLine(sw.ElapsedMilliseconds);
    Console.ReadLine();
  }
  [DllImport("msvcrt.dll")]
  private static extern int memcmp(byte[] arr1, byte[] arr2, int cnt);
}
like image 35
Hans Passant Avatar answered Oct 20 '22 10:10

Hans Passant


From: http://www.pinvoke.net/default.aspx/msvcrt.memcmp : Belowmentioned signature (by Saar) of memcmp is a x64 only signature. Using the the x64 only signatures on an x86 machine will result in a PInvoke stack imbalance. For x86 and x64 platform compatibility make sure you use a signature which specifies the Cdecl calling convention and uses the UIntPtr type to correctly marshall the size_t count argument:

    [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
    static extern int memcmp(byte[] b1, byte[] b2, UIntPtr count);

    static bool doImagesMatch(byte[] b1, byte[] b2) 
    {     
        return b1.Length == b2.Length && memcmp(b1, b2, new UIntPtr((uint)b1.Length)) == 0;
    } 

I use this code with success but I didn't have time to measure performance (yet). I'm using small array's of roughly 600 bytes. I have to use x86-compatible code because the vast majority of the computers in our non-profit organisation is x86.

Obviously you need a fast algorhythm to convert bitmap to byte[].

like image 1
B. Verhoeff Avatar answered Oct 20 '22 09:10

B. Verhoeff