Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Enumeration ValueSet.isEmpty slow

I am using Scala Enumeration ValueSets in a fairly high-throughput setting - creating, testing, union'ing and intersecting about 10M sets/second/core. I didn't expect this to be a big deal, because I had read somewhere that they were backed by BitSets, but surprisingly ValueSet.isEmpty showed up as a hot spot in a profiling session with YourKit.

To verify, I decided to try and reimplement what I needed using the Java BitSet, while trying to retain some of the type-safety of using Scala Enumerations. (code review moved to https://codereview.stackexchange.com/questions/74795/scala-bitset-implemented-with-java-bitset-for-use-in-scala-enumerations-to-repl ) The good news is, changing just my ValueSets to these BitSets did indeed lop off 25% of my run-time, so I don't know what ValueSet is really doing under the hood but it could be improved...

EDIT: Reviewing the ValueSet source seems to indicate that isEmpty is definitely O(N), implemented using the general SetLike.isEmpty. Considering ValueSet is implemented with a BitSet, is this a bug?

EDIT: This was the backtrace from the profiler. This seems like a crazy way to implement isEmpty on a bitset.

Backtrace of hot spot in YourKit

like image 503
experquisite Avatar asked Dec 24 '14 06:12

experquisite


1 Answers

For the record, I'm all for looking under the hood, but this design asks too much of any mortal coder.

The immortals, of course, have infinite time at their disposal.

Enumeration.ValueSet is backed by a BitSet but is not one itself. Something about favoring composition.

[Did you hear about the heir to a fortune who gave it all up to pursue his love of music? He favored composition over inheritance. Did I just make that up or did I hear it at Java One?]

No doubt, ValueSet should delegate more methods to the BitSet, including isEmpty.

I was going to suggest trying values.iterator.isEmpty, but that just tests hasNext which loops through all possible values checking for contains.

https://github.com/scala/scala/blob/v2.11.4/src/library/scala/collection/BitSetLike.scala#L109

If I'm reading that correctly.

The best option is e.values.toBitMask forall (_ == 0), which is the moral equivalent of BitSet.isEmpty.

like image 63
som-snytt Avatar answered Nov 02 '22 01:11

som-snytt