Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: Most efficent collection for simple iteration

Often I produce a collection on demand to save on instance data size. The consumer iterates through the collection probably only once, before its garbage collected. The consumer doesn't care about the order of the collection doesn't need to sort on it certainly doesn't need to mutate it, or any of its elements. What's the most efficient type safe collection in Scala? - an Array?

Later edit: It occurs to me that there are probably quite a few situations when I could use Sets. Is it good to use Sets when possible or just to use them when set functionality is really needed?

like image 889
Rich Oliver Avatar asked Dec 17 '22 00:12

Rich Oliver


2 Answers

Yes, of all the collection data structures, arrays come with the least amount of overhead when you know their size in advance.

If you don't know the size ahead of time, I would still pick an ArrayBuffer*. The algorithm used to expand the underlying array when it runs out of space is as efficient as it gets.

Do not* use a (linked) List or a Stream because these classes involve one heap allocation per element. Modern JVM garbage collectors are good, but they don't work for free.

*: But see @user unknown's comment on the question for links to some micro-benchmarks. The current ArrayBuffer implementation might be suboptimal.

Also have a look at .view. Often you don't need to actually store intermediate results. Instead you can use .map, .filter and others to build up a "description" of a collection. The operations (map, filter, etc.) will only be performed when you iterate over the collection, often in O(1) space. The downside is, that these views will be re-computed every time they're queried. (though this might still be more efficient with simple filters and huge underlying collections)

Also, be extra careful with views on mutable data structures. Views don't capture the state of the underlying data structure. When it changes, so does the view. Views on immutable data structures, however, behave very nicely. Finally, views obviously contain a reference to the underlying data structure, meaning it won't be garbage collected while your program holds onto the view.

(Updated) Vectors seem to strike a good balance between storage efficiency versus flexibility, especially for large sequences.

like image 194
Christian Klauser Avatar answered Jan 11 '23 06:01

Christian Klauser


Do you need to store the elements? Can't you compute them on-demand? If you can compute the values on-demand instead of storing them, you can create a Traversable or an Iterable that will do the job with barely any memory spent (no memory spent for Traversable, except for the class itself).

like image 43
Daniel C. Sobral Avatar answered Jan 11 '23 07:01

Daniel C. Sobral