Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best Scala collection type for vectorized numerical computing

Looking for the proper data type (such as IndexedSeq[Double]) to use when designing a domain-specific numerical computing library. For this question, I'm limiting scope to working with 1-Dimensional arrays of Double. The library will define a number functions that are typically applied for each element in the 1D array.

Considerations:

  • Prefer immutable data types, such as Vector or IndexedSeq
  • Want to minimize data conversions
  • Reasonably efficient in space and time
  • Friendly for other people using the library
  • Elegant and clean API

Should I use something higher up the collections hierarchy, such as Seq?

Or is it better to just define the single-element functions and leave the mapping/iterating to the end user?

This seems less efficient (since some computations could be done once per set of calls), but at at the same time a more flexible API, since it would work with any type of collection.

Any recommendations?

like image 744
supyo Avatar asked Dec 05 '12 15:12

supyo


1 Answers

If your computations are to do anything remotely computationally intensive, use Array, either raw or wrapped in your own classes. You can provide a collection-compatible wrapper, but make that an explicit wrapper for interoperability only. Everything other than Array is generic and thus boxed and thus comparatively slow and bulky.

If you do not use Array, people will be forced to abandon whatever things you have and just use Array instead when performance matters. Maybe that's okay; maybe you want the computations to be there for convenience not efficiency. In that case, I suggest using IndexedSeq for the interface, assuming that you want to let people know that indexing is not outrageously slow (e.g. is not List), and use Vector under the hood. You will use about 4x more memory than Array[Double], and be 3-10x slower for most low-effort operations (e.g. multiplication).

For example, this:

val u = v.map(1.0 / _)   //  v is Vector[Double]

is about three times slower than this:

val u = new Array[Double](v.length)
var j = 0
while (j<u.length) {
  u(j) = 1.0/v(j)      // v is Array[Double]
  j += 1
}

If you use the map method on Array, it's just as slow as the Vector[Double] way; operations on Array are generic and hence boxed. (And that's where the majority of the penalty comes from.)

like image 177
Rex Kerr Avatar answered Sep 28 '22 15:09

Rex Kerr