Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between Storable and Unboxed Vectors

So ... I've used unboxed vectors (from the vector package) preferably now without giving it much consideration. vector-th-unbox makes creating instances for them a breeze, so why not.

Now I ran into an instance where it is not possible for me to automatically derive those instances, a data type with phantom type parameters (as in Vector (s :: Nat) a, where s encodes the length).

This made me think about the differences between Storable and Unboxed vectors. Things I figured out on my own:

  • Unboxed will store eg tuples as separate vectors leading to better cache locality, by not wasting bandwidth when only one of those values is needed.
  • Storable will still be compiled to simple (and probably efficient) readArray#s that return unboxed values (as evident by reading core).
  • Storable allows direct pointer access which allows interoperability with foreign code. Unboxed doesn't.
  • [edit] Storable instances are actually easier to write by hand than Unbox (that is Vector and MVector) ones.

That alone doesn't make it evident to me why Unboxed even exists, there seem to be little benefit to it. Probably I am missing something there?

like image 857
fho Avatar asked Oct 21 '16 12:10

fho


1 Answers

Cribbed from https://haskell-lang.org/library/vector

Storable and unboxed vectors both store their data in a byte array, avoiding pointer indirection. This is more memory efficient and allows better usage of caches. The distinction between storable and unboxed vectors is subtle:

  • Storable vectors require data which is an instance of the Storable type class. This data is stored in malloced memory, which is pinned (the garbage collector can't move it around). This can lead to memory fragmentation, but allows the data to be shared over the C FFI.
  • Unboxed vectors require data which is an instance of the Prim type class. This data is stored in GC-managed unpinned memory, which helps avoid memory fragmentation. However, this data cannot be shared over the C FFI.

Both the Storable and Prim typeclasses provide a way to store a value as bytes, and to load bytes into a value. The distinction is what type of bytearray is used.

As usual, the only true measure of performance will be benchmarking. However, as a general guideline:

  • If you don't need to pass values to a C FFI, and you have a Prim instance, use unboxed vectors.
  • If you have a Storable instance, use a storable vector.
  • Otherwise, use a boxed vector.

There are also other issues to consider, such as the fact that boxed vectors are instances of Functor while storable and unboxed vectors are not.

like image 184
Michael Snoyman Avatar answered Nov 01 '22 00:11

Michael Snoyman