I need to bitor 2 very large (>1GB) ByteArray in Spark (so using Scala).
I'm look for the most efficient way (in term of speed and memory), which means I don't want to use stuff like the 'zip' method that will transform my array to a list.
For now, I'm using the following method but I would like to know if some of you have other idea...
def bitor(x: Array[Byte], y: Array[Byte]) : Array[Byte] = {
for(i <- 0 to x.size) {
x(i) = (x(i) | y(i)).toByte
}
return x
}
Should I go through JNI and compute it in native C ?
Your code that use foreach desugars into equivalent of this java code:
public final class _$$anon$1$$anonfun$bitor$1 extends AbstractFunction1$mcVI$sp implements Serializable {
private final byte[] x$1;
private final byte[] y$1;
public _$$anon$1$$anonfun$bitor$1(byte[] x$1, byte[] y$1) {
this.x$1 = x$1;
this.y$1 = y$1;
}
@Override
public final void apply(final int i) {
this.apply$mcVI$sp(i);
}
@Override
public void apply$mcVI$sp(final int i) {
this.x$1[i] |= this.y$1[i];
}
}
private byte[] bitor(final byte[] x, final byte[] y) {
RichInt.to$extension0(Predef.intWrapper(0), Predef.byteArrayOps(x).size())
.foreach(new _$$anon$1$$anonfun$bitor$1(x, y));
return x;
}
However, if you replace for comprehension with while, things change:
def bitor(x: Array[Byte], y: Array[Byte]) : Array[Byte] = {
var i = 0
while (i < x.length) {
x(i) = (x(i) | y(i)).toByte
i += 1
}
x
}
Is transpiled to:
private byte[] bitor(final byte[] x, final byte[] y) {
for (int i = 0; i < x.length; ++i) {
x[i] |= y[i];
}
return x;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With