Best Scala collection type for vectorized numerical computing

Question

Looking for the proper data type (such as IndexedSeq[Double]) to use when designing a domain-specific numerical computing library. For this question, I'm limiting scope to working with 1-Dimensional arrays of Double. The library will define a number functions that are typically applied for each element in the 1D array.

Considerations:

Prefer immutable data types, such as Vector or IndexedSeq
Want to minimize data conversions
Reasonably efficient in space and time
Friendly for other people using the library
Elegant and clean API

Should I use something higher up the collections hierarchy, such as Seq?

Or is it better to just define the single-element functions and leave the mapping/iterating to the end user?

This seems less efficient (since some computations could be done once per set of calls), but at at the same time a more flexible API, since it would work with any type of collection.

Any recommendations?

Rex Kerr · Accepted Answer

If your computations are to do anything remotely computationally intensive, use Array, either raw or wrapped in your own classes. You can provide a collection-compatible wrapper, but make that an explicit wrapper for interoperability only. Everything other than Array is generic and thus boxed and thus comparatively slow and bulky.

If you do not use Array, people will be forced to abandon whatever things you have and just use Array instead when performance matters. Maybe that's okay; maybe you want the computations to be there for convenience not efficiency. In that case, I suggest using IndexedSeq for the interface, assuming that you want to let people know that indexing is not outrageously slow (e.g. is not List), and use Vector under the hood. You will use about 4x more memory than Array[Double], and be 3-10x slower for most low-effort operations (e.g. multiplication).

For example, this:

val u = v.map(1.0 / _)   //  v is Vector[Double]

is about three times slower than this:

val u = new Array[Double](v.length)
var j = 0
while (j<u.length) {
  u(j) = 1.0/v(j)      // v is Array[Double]
  j += 1
}

If you use the map method on Array, it's just as slow as the Vector[Double] way; operations on Array are generic and hence boxed. (And that's where the majority of the penalty comes from.)

Best Scala collection type for vectorized numerical computing

Tags:

scala

scala-collections

api

numerical-computing

supyo

1 Answers

Rex Kerr

Recent Activity

Donate For Us

Best Scala collection type for vectorized numerical computing

Tags:

scala

scala-collections

api

numerical-computing

supyo

1 Answers

Rex Kerr

Related questions

Recent Activity

Donate For Us