I want to get an array of bytes (Array[Byte]) from somewhere (read from file, from socket, etc) and then provide a efficient way to pull bits out of it (e.g. provide a function to extract a 32-bit integer from offset N in array). I would then like to wrap the byte array (hiding it) providing functions to pull bits out from the array (probably using lazy val for each bit to pull out).
I would imagine having a wrapping class that takes an immutable byte array type in the constructor to prove the array contents is never modified. IndexedSeq[Byte] seemed relevant, but I could not work out how to go from Array[Byte] to IndexedSeq[Byte].
Part 2 of the question is if I used IndexedSeq[Byte] will the resultant code be any slower? I need the code to execute as fast as possible, so would stick with Array[Byte] if the compiler could do a better job with it.
I could write a wrapper class around the array, but that would slow things down - one extra level of indirection for each access to bytes in the array. Performance is critical due to the number of array accesses that will be required. I need fast code, but would like to do the code nicely at the same time. Thanks!
PS: I am a Scala newbie.
Treating Array[T]
as an IndexedSeq[T]
could hardly be simpler:
Array(1: Byte): IndexedSeq[Byte] // trigger an Implicit View
wrapByteArray(Array(1: Byte)) // explicitly calling
Unboxing will kill you long before an extra layer of indirection.
C:\>scala -Xprint:erasure -e "{val a = Array(1: Byte); val b1: Byte = a(0); val
b2 = (a: IndexedSeq[Byte])(0)}"
[[syntax trees at end of erasure]]// Scala source: scalacmd5680604016099242427.s
cala
val a: Array[Byte] = scala.Array.apply((1: Byte), scala.this.Predef.
wrapByteArray(Array[Byte]{}));
val b1: Byte = a.apply(0);
val b2: Byte = scala.Byte.unbox((scala.this.Predef.wrapByteArray(a): IndexedSeq).apply(0));
To avoid this, the Scala collections library should be specialized on the element type, in the same style as Tuple1
and Tuple2
. I'm told this is planned, but it's a bit more involved than simply slapping @specialized
everywhere, so I don't know how long it will take.
UPDATE
Yes, WrappedArray
is mutable, although collection.IndexedSeq[Byte]
doesn't have methods to mutate, so you could just trust clients not to cast to a mutable interface. The next release of Scalaz will include ImmutableArray which prevents this.
The boxing comes retrieving an element from the collection via this generic method:
trait SeqLike[+A, +Repr] extends IterableLike[A, Repr] { self =>
def apply(idx: Int): A
}
At the JVM level, this signature is type-erased to:
def apply(idx: Int): Object
If your collection contains primitives, that is, subtypes of AnyVal
, they must be boxed in the corresponding wrapper to be returned from this method. For some applications, this is a major performance concern. Entire libraries have been written in Java to avoid this, notably fastutils.
Annotation directed specialization was added to Scala 2.8 to instruct the compiler to generate various versions of a class or method tailored to the permutations of primitive types. This has been applied to a few places in the standard library already, e.g. TupleN
, ProductN
, Function{0, 1, 2}
. If this was also applied to the collections hierarchy, this performance cost could be alleviated.
If you want to work with sequences in Scala, I recommend you choose one of these:
Immutable seqs:
(linked seqs) List, Stream, Queue
(indexed seqs) Vector
Mutable seqs:
(linked seq) ListBuffer
(indexed seq) ArrayBuffer
The new (2.8) Scala collections have been hard to grasp for me, primarily due to shortage of (correct) documentation but also because of the source code (complex hierarchys). To clear my mind I made this pic to visualize the basic structure:
(source: programmera.net)
Also, note that Array
is not part of the tree structure, it is a special case, since it wraps the Java array (which is a special case in Java).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With