Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is most efficient way to do immutable byte arrays in Scala?

I want to get an array of bytes (Array[Byte]) from somewhere (read from file, from socket, etc) and then provide a efficient way to pull bits out of it (e.g. provide a function to extract a 32-bit integer from offset N in array). I would then like to wrap the byte array (hiding it) providing functions to pull bits out from the array (probably using lazy val for each bit to pull out).

I would imagine having a wrapping class that takes an immutable byte array type in the constructor to prove the array contents is never modified. IndexedSeq[Byte] seemed relevant, but I could not work out how to go from Array[Byte] to IndexedSeq[Byte].

Part 2 of the question is if I used IndexedSeq[Byte] will the resultant code be any slower? I need the code to execute as fast as possible, so would stick with Array[Byte] if the compiler could do a better job with it.

I could write a wrapper class around the array, but that would slow things down - one extra level of indirection for each access to bytes in the array. Performance is critical due to the number of array accesses that will be required. I need fast code, but would like to do the code nicely at the same time. Thanks!

PS: I am a Scala newbie.

like image 344
Alan Kent Avatar asked Jul 28 '10 06:07

Alan Kent


2 Answers

Treating Array[T] as an IndexedSeq[T] could hardly be simpler:

Array(1: Byte): IndexedSeq[Byte] // trigger an Implicit View
wrapByteArray(Array(1: Byte))    // explicitly calling

Unboxing will kill you long before an extra layer of indirection.

C:\>scala -Xprint:erasure -e "{val a = Array(1: Byte); val b1: Byte = a(0); val
b2 = (a: IndexedSeq[Byte])(0)}"
[[syntax trees at end of erasure]]// Scala source: scalacmd5680604016099242427.s
cala

val a: Array[Byte] = scala.Array.apply((1: Byte), scala.this.Predef.
wrapByteArray(Array[Byte]{}));
val b1: Byte = a.apply(0);
val b2: Byte = scala.Byte.unbox((scala.this.Predef.wrapByteArray(a): IndexedSeq).apply(0));

To avoid this, the Scala collections library should be specialized on the element type, in the same style as Tuple1 and Tuple2. I'm told this is planned, but it's a bit more involved than simply slapping @specialized everywhere, so I don't know how long it will take.

UPDATE

Yes, WrappedArray is mutable, although collection.IndexedSeq[Byte] doesn't have methods to mutate, so you could just trust clients not to cast to a mutable interface. The next release of Scalaz will include ImmutableArray which prevents this.

The boxing comes retrieving an element from the collection via this generic method:

trait SeqLike[+A, +Repr] extends IterableLike[A, Repr] { self =>
  def apply(idx: Int): A
}

At the JVM level, this signature is type-erased to:

  def apply(idx: Int): Object

If your collection contains primitives, that is, subtypes of AnyVal, they must be boxed in the corresponding wrapper to be returned from this method. For some applications, this is a major performance concern. Entire libraries have been written in Java to avoid this, notably fastutils.

Annotation directed specialization was added to Scala 2.8 to instruct the compiler to generate various versions of a class or method tailored to the permutations of primitive types. This has been applied to a few places in the standard library already, e.g. TupleN, ProductN, Function{0, 1, 2}. If this was also applied to the collections hierarchy, this performance cost could be alleviated.

like image 129
retronym Avatar answered Nov 05 '22 18:11

retronym


If you want to work with sequences in Scala, I recommend you choose one of these:

Immutable seqs:

(linked seqs) List, Stream, Queue

(indexed seqs) Vector

Mutable seqs:

(linked seq) ListBuffer

(indexed seq) ArrayBuffer

The new (2.8) Scala collections have been hard to grasp for me, primarily due to shortage of (correct) documentation but also because of the source code (complex hierarchys). To clear my mind I made this pic to visualize the basic structure:

alt text
(source: programmera.net)

Also, note that Array is not part of the tree structure, it is a special case, since it wraps the Java array (which is a special case in Java).

like image 10
olle kullberg Avatar answered Nov 05 '22 16:11

olle kullberg