Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient bitwise OR of two Byte[Array]

I need to bitor 2 very large (>1GB) ByteArray in Spark (so using Scala).

I'm look for the most efficient way (in term of speed and memory), which means I don't want to use stuff like the 'zip' method that will transform my array to a list.

For now, I'm using the following method but I would like to know if some of you have other idea...

def bitor(x: Array[Byte], y: Array[Byte]) : Array[Byte] = {
  for(i <- 0 to x.size) {
    x(i) = (x(i) | y(i)).toByte
  }
  return x
}

Should I go through JNI and compute it in native C ?

like image 236
elldekaa Avatar asked Jan 21 '26 12:01

elldekaa


1 Answers

Your code that use foreach desugars into equivalent of this java code:

public final class _$$anon$1$$anonfun$bitor$1 extends AbstractFunction1$mcVI$sp implements Serializable {
    private final byte[] x$1;
    private final byte[] y$1;

    public _$$anon$1$$anonfun$bitor$1(byte[] x$1, byte[] y$1) {
        this.x$1 = x$1;
        this.y$1 = y$1;
    }

    @Override
    public final void apply(final int i) {
        this.apply$mcVI$sp(i);
    }

    @Override
    public void apply$mcVI$sp(final int i) {
        this.x$1[i] |= this.y$1[i];
    }
}


private byte[] bitor(final byte[] x, final byte[] y) {
    RichInt.to$extension0(Predef.intWrapper(0), Predef.byteArrayOps(x).size())
            .foreach(new _$$anon$1$$anonfun$bitor$1(x, y));
    return x;
}

However, if you replace for comprehension with while, things change:

def bitor(x: Array[Byte], y: Array[Byte]) : Array[Byte] = {
  var i = 0
  while (i < x.length) {
    x(i) = (x(i) | y(i)).toByte
    i += 1
  }

  x
}

Is transpiled to:

private byte[] bitor(final byte[] x, final byte[] y) {
    for (int i = 0; i < x.length; ++i) {
        x[i] |= y[i];
    }
    return x;
}
like image 95
Aivean Avatar answered Jan 23 '26 04:01

Aivean