Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare 2 byte arrays

Tags:

arrays

c#

I have 2 int arrays.

int[] data1 #
int[] data2 #

I want to create a 3rd int[] data3 which is the differences between the 2 other arrays.

Let us take the 1st value in data1.

The value is 15 (e.g.).

Now let us take the 1st value in data2.

The value is 3 (e.g.).

The 1st value in data3 would be 12.

BUT, if the 1st values were the other way round i.e.

data1[0]  = 3
data2[0]  = 15

then the difference would be -12. Yet I want it to be just 12.

at the moment I have a for loop and I do the computation stuff there to get that type of result.

  1. Is there a way to do data1-data2 = data3 without enumerating through a loop?
  2. If so, can I just get the differences without using minus numbers?

Thanks

N.B. In response to the 'closers'. I agree with you to a certain point. What I need to add to this questions is this:

I am looking for the most efficient (quickest way but low memory being 2nd priority) to facilitate this. Using Linq (as I understand it) can be the slowest approach?

like image 313
Andrew Simpson Avatar asked Sep 09 '14 11:09

Andrew Simpson


People also ask

How to compare 2 byte arrays?

The Byte array in Java is an 8-bit signed two's complement integer with minimum value -128 and maximum 127. To compare two-byte arrays, Java has a built-in method Arrays. equal(). It checks for the equality of two arrays.

How to compare byte array?

equals(byte[] a, byte[] a2) method returns true if the two specified arrays of bytes are equal to one another. Two arrays are equal if they contain the same elements in the same order. Two array references are considered equal if both are null.

What is byte and byte array in Python?

python3's bytes and bytearray classes both hold arrays of bytes, where each byte can take on a value between 0 and 255. The primary difference is that a bytes object is immutable, meaning that once created, you cannot modify its elements. By contrast, a bytearray object allows you to modify its elements.


4 Answers

You are looking for Zip method

var data3 = data1.Zip(data2, (d1,d2) => Math.Abs(d1 - d2)).ToArray();

Enumerable.Zip<TFirst, TSecond, TResult> Method

Applies a specified function to the corresponding elements of two sequences, producing a sequence of the results.

So it simply takes each corresponding element, e.g data1[0] and data2[0], then data1[1] and data2[1] so on.. then applies the function Math.Abs(d1-d2) which simply substracts two numbers and gets the absolute value of the result. Then returns a sequence which contains the result of each operation.

like image 159
Selman Genç Avatar answered Oct 13 '22 10:10

Selman Genç


"Is there a way to do data1-data2 = data3 without enumerating through a loop?" No. It is technically impossible.

At best, or rather worst, you can call function that will do enumeration for you. But it will be slow. In case of LINQ, ungodly slow.

For machine I am currently working on results from other answers are as following for 4KB table (1024 integers).

  • 23560 ticks - Giannis Paraskevopoulos. Array-Enumerable-Array conversions not too fast, Copying array via ToList().ToArray() chain is roughly 25 times slower than Array.Copy().
  • 10198 ticks - Selman22. 2 times faster, but still slow. Lambdas are eye candy to make creating events prettier, not faster. You end up with some anonymous method laying around, which takes can eat more CPU time for call-return than its operation (remember that math we do here CPU can do in just few cycles).
  • 566 ticks - Tim Schmelter GetDifference() function (Main culprit is JIT here, in native code and/or more often usage difference would be negligible)
  • 27 ticks - Just a loop. 400 times faster than Zip, over 800 faster than converting array to list and back.

Loop code:

for (int i = 0; i < data3.Length; i++)
{
  data3[i] = Math.Abs(data1[i] - data2[i]);
}

Such basic memory operations can be directly translated to machine code without horrible performance and humongous memory footprint of LINQ.

Moral of the story is: LINQ is for readability (which in this case is arguable) not for performance (which in this case is noticeable).


Optimization time! Lets abuse our CPU a tiny bit.

  1. Unroll loop. Or do not. Your experience may vary. Even in assembler itself loop unrolling performance gain or lose vary greatly in same family of processors. New CPU's and compilers are aware of old tricks and simply implement them on their own. For i3-3220 I tested code on loop unroll to 4 lines resulted in faster execution on 32bit code but on 64bit it was a bit slower while unroll to 8 was opposite.
  2. Compile for x64. As we are here working on 32bit data we won't make use of 64bit registers... or will we? On x86 less than half of registers are truly available to generated code (in assembly written by hand you can always squeeze out more), on x64 however you get eight bonus registers which are free to use. The more you can do without accessing memory, the faster your code. In this case speed gain is at about 20%.
  3. Close Visual Studio. Do not speed-test 64 bit code in 32bit IDE (there is no 64bit version as for now, and probably wont be for long time). It will make x64 code roughly two times slower due to architecture mismatch. (Well... you should never speed-test code under debugger anyway...)
  4. Do not use built-in functions too much. In this case Math.Abs have overhead hidden inside. For some reasons (which will need analyze of IL to find out), checking for negative values was faster with ?: than with If-Else. Such check saved a lot of time.

UPDATE: ?: is faster than If-Else due to differences in resulting machine code... at least for just comparing two values. Its machine code is much less weird than If-Else (which does not look like what you would write "by-hand"). Apparently it is not just different form of writing If-Else statement but fully separate command optimized for simple conditional assignment.

Resulting code was roughly 8 times faster than simple loop with Math.Abs(); Remember you can unroll loop only to divisors of your dataset size. You wrote that your dataset size is 25920, so 8 is fine. (max is 64, but I doubt it would have any sense to go that high). I suggest hiding this code in some function as it is fugly.

int[] data3 = new int[data1.Length];
for (int i = 0; i < data1.Length; i += 8)
{
    int b;
    b = (data1[i + 0] - data2[i + 0]);
    data3[i + 0] = b < 0 ? -b : b;
    b = (data1[i + 1] - data2[i + 1]);
    data3[i + 1] = b < 0 ? -b : b;
    b = (data1[i + 2] - data2[i + 2]);
    data3[i + 2] = b < 0 ? -b : b;
    b = (data1[i + 3] - data2[i + 3]);
    data3[i + 3] = b < 0 ? -b : b;
    b = (data1[i + 3] - data2[i + 4]);
    data3[i + 4] = b < 0 ? -b : b;
    b = (data1[i + 5] - data2[i + 5]);
    data3[i + 5] = b < 0 ? -b : b;
    b = (data1[i + 6] - data2[i + 6]);
    data3[i + 6] = b < 0 ? -b : b;
    b = (data1[i + 7] - data2[i + 7]);
    data3[i + 7] = b < 0 ? -b : b;
}

This is not even its final form. I will try to do some more heretic tricks on it.

BitHack's, low level cheats!

As I mentioned, there was still place for improvements.

After cutting out LINQ, main ticks munchkin was Abs(). When it was removed from code we got left with contest between IF-ELSE and shorthand ?: operator. Both are branching operators, which once in the past were widely known as being slower than linear code. Currently ease of use/writing tend to be picked over performance (sometimes correctly, sometimes incorrectly).

So lets make our branching condition linear. It is possible by abusing fact that branching in this code contains math operating on just single variable. So lets make code equivalent of this.

Now do you remember how to negate Two's complement number?, negate all bits and add one. Lets do it in one line without conditions then!

It is bitwise operators time to shine. OR and AND are boring, real men use XOR. Whats so cool about XOR? Aside of its usual behavior you can also turn it into NOT (negation) and NOP (no-operation).

1 XOR 1 = 0
0 XOR 1 = 1

so XOR'ing by value filled with only 1's gives you NOT operation.

1 XOR 0 = 1
0 XOR 0 = 0

so XOR'ing by value filled with only 0's does nothing at all.

We can obtain sign from our number. For 32bit integer it is as simple as x>>31. It moves bit sign to lowest bit. As even wiki will tell you, bits inserted from left will be zeros, so you result of x>>31 will be 1 for negative number (x<0) and 0 for non-negative (x>=0), right?

Nope. For signed values Arithmetic shift is used over plain bit-shift. So we will get -1 or 0 depending on sign.... which means that 'x>>31' will give 111...111 for negative and 000...000 for non-negative. If you will XOR original x by result of such shift you will perform NOT or NOP depending on value sign. Another useful thing is that 0 will result in NOP for addition/negation so we can add/subtract -1 depending on value sign.

So 'x^(x>>31)' will flip bits of negative number while making no changes to non-negative and 'x-(x>>31)' will add 1 (negated negative value gives positive) to negative x and make no changes to non-negative value.

When combined you get '(x ^ (x >> 31)) - (x >> 31)'... which can be translated to:

IF X<0
  X=!X+1

and it is just

IF X<0
  X=-X

How does it affect performance? Our XorAbs() requires just four basic integer operations with one load and one store. Branching operator itself takes about as much CPU ticks. And while modern CPU's are great at doing branch-predicting, they are still faster by simply not doing it when feed sequential code.

And whats the score?

  1. Roughly four times faster than built-in Abs();
  2. About twice as fast as previous code (versions without unrolling)
  3. Depending on CPU it can get better result without loop-unrolling. Due to elimination of code branching CPU can "unroll" loop on its own. (Haswells are weird with unrolling)

Resulting code:

for (int i = 0; i < data1.Length; i++)
{
  int x = data1[i] - data2[i];
  data3[i] = (x ^ (x >> 31)) - (x >> 31);
}

Parallelism and Cache usage

CPU have super fast Cache memory, when processing an array sequentially it will copy whole chunks of it to cache. But if you write crappy code you will get cache misses. You can easily fall into this trap by screwing up order of nested loops.

Parallelism (multiple threads, same data) must works on sequential chunks in order to make good use of cpu cache.

Writing threads by hand will allow you to pick chunks for threads manually, but it is bothersome way. Since 4.0 .NET comes with helpers for that, however default Parallel.For makes a mess of cache. So this code is actually slower than its single thread version due to cache-miss.

Parallel.For(0, data1.Length,
fn =>
{
  int x = data1[fn] - data2[fn];
  data3[fn] = (x ^ (x >> 31)) - (x >> 31);
}

It is possible to make manual use of cached data by performing sequential operation in it. For example you can unroll loop, but its dirty hack and unrolling have its own performance issues (it depends on CPU model).

Parallel.For(0, data1.Length >> 3,
i =>
{
    int b;
    b = (data1[i + 0] - data2[i + 0]);
    data3[i + 0] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 1] - data2[i + 1]);
    data3[i + 1] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 2] - data2[i + 2]);
    data3[i + 2] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 3] - data2[i + 3]);
    data3[i + 3] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 3] - data2[i + 4]);
    data3[i + 4] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 5] - data2[i + 5]);
    data3[i + 5] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 6] - data2[i + 6]);
    data3[i + 6] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 7] - data2[i + 7]);
    data3[i + 7] = b < 0 ? (b ^ -1) + b : b;
}

However .NET also have Parrarel.ForEach and Load Balancing Partitioners. By using both of them you get best of all worlds:

  • dataset size independent code
  • short, neat code
  • multithreading
  • good cache usage

So final code would be:

var rangePartitioner = Partitioner.Create(0, data1.Length);
Parallel.ForEach(rangePartitioner, (range, loopState)
=>
{
    for (int i = range.Item1; i < range.Item2; i++)
    {
        int x = data1[i] - data2[i];
        data3[i] = (x ^ (x >> 31)) - (x >> 31);
    }
});

It is far from maximum CPU usage (which is more complicated than just maxing its clock, there are multiple cache levels, several pipelines and much more) but it is readable, fast and platform independent (except integer size, but C# int is alias to System.Int32 so we are safe here).

Here I think we will stop with optimization. It came out as an article rather than answer, I hope no one will purge me for it.

like image 41
PTwr Avatar answered Oct 13 '22 09:10

PTwr


Here is another (less readable but maybe a little bit more efficient) approach that does not need LINQ:

public static int[] GetDifference(int[] first, int[] second)
{
    int commonLength = Math.Min(first.Length, second.Length);
    int[] diff = new int[commonLength];
    for (int i = 0; i < commonLength; i++)
        diff[i] = Math.Abs(first[i] - second[i]);
    return diff;
}

Why little bit more efficient? Because ToArray has to resize the array until it knows the final size.

like image 39
Tim Schmelter Avatar answered Oct 13 '22 08:10

Tim Schmelter


var data3 = data1.Select((x,i)=>new {x,i})
    .Join
    (
        data2.Select((x,i)=>new {x,i}),
        x=>x.i,
        x=>x.i,
        (d1,d2)=>Math.Abs(d1.x-d2.x)
    )
    .ToArray();
like image 35
Giannis Paraskevopoulos Avatar answered Oct 13 '22 09:10

Giannis Paraskevopoulos