I have 2 int arrays.
int[] data1 #
int[] data2 #
I want to create a 3rd int[] data3 which is the differences between the 2 other arrays.
Let us take the 1st value in data1.
The value is 15 (e.g.).
Now let us take the 1st value in data2.
The value is 3 (e.g.).
The 1st value in data3 would be 12.
BUT, if the 1st values were the other way round i.e.
data1[0] = 3
data2[0] = 15
then the difference would be -12. Yet I want it to be just 12.
at the moment I have a for loop and I do the computation stuff there to get that type of result.
Thanks
N.B. In response to the 'closers'. I agree with you to a certain point. What I need to add to this questions is this:
I am looking for the most efficient (quickest way but low memory being 2nd priority) to facilitate this. Using Linq (as I understand it) can be the slowest approach?
The Byte array in Java is an 8-bit signed two's complement integer with minimum value -128 and maximum 127. To compare two-byte arrays, Java has a built-in method Arrays. equal(). It checks for the equality of two arrays.
equals(byte[] a, byte[] a2) method returns true if the two specified arrays of bytes are equal to one another. Two arrays are equal if they contain the same elements in the same order. Two array references are considered equal if both are null.
python3's bytes and bytearray classes both hold arrays of bytes, where each byte can take on a value between 0 and 255. The primary difference is that a bytes object is immutable, meaning that once created, you cannot modify its elements. By contrast, a bytearray object allows you to modify its elements.
You are looking for Zip
method
var data3 = data1.Zip(data2, (d1,d2) => Math.Abs(d1 - d2)).ToArray();
Enumerable.Zip<TFirst, TSecond, TResult> Method
Applies a specified function to the corresponding elements of two sequences, producing a sequence of the results.
So it simply takes each corresponding element, e.g data1[0]
and data2[0]
, then data1[1]
and data2[1]
so on.. then applies the function Math.Abs(d1-d2)
which simply substracts two numbers and gets the absolute value of the result. Then returns a sequence which contains the result of each operation.
"Is there a way to do data1-data2 = data3 without enumerating through a loop?" No. It is technically impossible.
At best, or rather worst, you can call function that will do enumeration for you. But it will be slow. In case of LINQ, ungodly slow.
For machine I am currently working on results from other answers are as following for 4KB table (1024 integers).
Loop code:
for (int i = 0; i < data3.Length; i++)
{
data3[i] = Math.Abs(data1[i] - data2[i]);
}
Such basic memory operations can be directly translated to machine code without horrible performance and humongous memory footprint of LINQ.
Moral of the story is: LINQ is for readability (which in this case is arguable) not for performance (which in this case is noticeable).
UPDATE: ?: is faster than If-Else due to differences in resulting machine code... at least for just comparing two values. Its machine code is much less weird than If-Else (which does not look like what you would write "by-hand"). Apparently it is not just different form of writing If-Else statement but fully separate command optimized for simple conditional assignment.
Resulting code was roughly 8 times faster than simple loop with Math.Abs(); Remember you can unroll loop only to divisors of your dataset size. You wrote that your dataset size is 25920, so 8 is fine. (max is 64, but I doubt it would have any sense to go that high). I suggest hiding this code in some function as it is fugly.
int[] data3 = new int[data1.Length];
for (int i = 0; i < data1.Length; i += 8)
{
int b;
b = (data1[i + 0] - data2[i + 0]);
data3[i + 0] = b < 0 ? -b : b;
b = (data1[i + 1] - data2[i + 1]);
data3[i + 1] = b < 0 ? -b : b;
b = (data1[i + 2] - data2[i + 2]);
data3[i + 2] = b < 0 ? -b : b;
b = (data1[i + 3] - data2[i + 3]);
data3[i + 3] = b < 0 ? -b : b;
b = (data1[i + 3] - data2[i + 4]);
data3[i + 4] = b < 0 ? -b : b;
b = (data1[i + 5] - data2[i + 5]);
data3[i + 5] = b < 0 ? -b : b;
b = (data1[i + 6] - data2[i + 6]);
data3[i + 6] = b < 0 ? -b : b;
b = (data1[i + 7] - data2[i + 7]);
data3[i + 7] = b < 0 ? -b : b;
}
This is not even its final form. I will try to do some more heretic tricks on it.
As I mentioned, there was still place for improvements.
After cutting out LINQ, main ticks munchkin was Abs(). When it was removed from code we got left with contest between IF-ELSE and shorthand ?: operator. Both are branching operators, which once in the past were widely known as being slower than linear code. Currently ease of use/writing tend to be picked over performance (sometimes correctly, sometimes incorrectly).
So lets make our branching condition linear. It is possible by abusing fact that branching in this code contains math operating on just single variable. So lets make code equivalent of this.
Now do you remember how to negate Two's complement number?, negate all bits and add one. Lets do it in one line without conditions then!
It is bitwise operators time to shine. OR and AND are boring, real men use XOR. Whats so cool about XOR? Aside of its usual behavior you can also turn it into NOT (negation) and NOP (no-operation).
1 XOR 1 = 0
0 XOR 1 = 1
so XOR'ing by value filled with only 1's gives you NOT operation.
1 XOR 0 = 1
0 XOR 0 = 0
so XOR'ing by value filled with only 0's does nothing at all.
We can obtain sign from our number. For 32bit integer it is as simple as x>>31
. It moves bit sign to lowest bit. As even wiki will tell you, bits inserted from left will be zeros, so you result of x>>31
will be 1 for negative number (x<0) and 0 for non-negative (x>=0), right?
Nope. For signed values Arithmetic shift is used over plain bit-shift. So we will get -1 or 0 depending on sign.... which means that 'x>>31' will give 111...111 for negative and 000...000 for non-negative. If you will XOR original x by result of such shift you will perform NOT or NOP depending on value sign. Another useful thing is that 0 will result in NOP for addition/negation so we can add/subtract -1 depending on value sign.
So 'x^(x>>31)' will flip bits of negative number while making no changes to non-negative and 'x-(x>>31)' will add 1 (negated negative value gives positive) to negative x and make no changes to non-negative value.
When combined you get '(x ^ (x >> 31)) - (x >> 31)'... which can be translated to:
IF X<0
X=!X+1
and it is just
IF X<0
X=-X
How does it affect performance? Our XorAbs() requires just four basic integer operations with one load and one store. Branching operator itself takes about as much CPU ticks. And while modern CPU's are great at doing branch-predicting, they are still faster by simply not doing it when feed sequential code.
And whats the score?
Resulting code:
for (int i = 0; i < data1.Length; i++)
{
int x = data1[i] - data2[i];
data3[i] = (x ^ (x >> 31)) - (x >> 31);
}
CPU have super fast Cache memory, when processing an array sequentially it will copy whole chunks of it to cache. But if you write crappy code you will get cache misses. You can easily fall into this trap by screwing up order of nested loops.
Parallelism (multiple threads, same data) must works on sequential chunks in order to make good use of cpu cache.
Writing threads by hand will allow you to pick chunks for threads manually, but it is bothersome way. Since 4.0 .NET comes with helpers for that, however default Parallel.For makes a mess of cache. So this code is actually slower than its single thread version due to cache-miss.
Parallel.For(0, data1.Length,
fn =>
{
int x = data1[fn] - data2[fn];
data3[fn] = (x ^ (x >> 31)) - (x >> 31);
}
It is possible to make manual use of cached data by performing sequential operation in it. For example you can unroll loop, but its dirty hack and unrolling have its own performance issues (it depends on CPU model).
Parallel.For(0, data1.Length >> 3,
i =>
{
int b;
b = (data1[i + 0] - data2[i + 0]);
data3[i + 0] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 1] - data2[i + 1]);
data3[i + 1] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 2] - data2[i + 2]);
data3[i + 2] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 3] - data2[i + 3]);
data3[i + 3] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 3] - data2[i + 4]);
data3[i + 4] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 5] - data2[i + 5]);
data3[i + 5] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 6] - data2[i + 6]);
data3[i + 6] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 7] - data2[i + 7]);
data3[i + 7] = b < 0 ? (b ^ -1) + b : b;
}
However .NET also have Parrarel.ForEach and Load Balancing Partitioners. By using both of them you get best of all worlds:
So final code would be:
var rangePartitioner = Partitioner.Create(0, data1.Length);
Parallel.ForEach(rangePartitioner, (range, loopState)
=>
{
for (int i = range.Item1; i < range.Item2; i++)
{
int x = data1[i] - data2[i];
data3[i] = (x ^ (x >> 31)) - (x >> 31);
}
});
It is far from maximum CPU usage (which is more complicated than just maxing its clock, there are multiple cache levels, several pipelines and much more) but it is readable, fast and platform independent (except integer size, but C# int is alias to System.Int32 so we are safe here).
Here I think we will stop with optimization. It came out as an article rather than answer, I hope no one will purge me for it.
Here is another (less readable but maybe a little bit more efficient) approach that does not need LINQ:
public static int[] GetDifference(int[] first, int[] second)
{
int commonLength = Math.Min(first.Length, second.Length);
int[] diff = new int[commonLength];
for (int i = 0; i < commonLength; i++)
diff[i] = Math.Abs(first[i] - second[i]);
return diff;
}
Why little bit more efficient? Because ToArray
has to resize the array until it knows the final size.
var data3 = data1.Select((x,i)=>new {x,i})
.Join
(
data2.Select((x,i)=>new {x,i}),
x=>x.i,
x=>x.i,
(d1,d2)=>Math.Abs(d1.x-d2.x)
)
.ToArray();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With