Scala find missing values in a range

Tags:

scala-collections

For a given range, for instance

val range = (1 to 5).toArray
val ready = Array(2,4)

the missing values (not ready) are

val missing = range.toSet diff ready.toSet
Set(5, 1, 3)

The real use case includes thousands of range instances with (possibly) thousands of missing or not ready values. Is there a more time-efficient approach in Scala?

655

asked Sep 10 '15 08:09

1 Answers

The diff operation is implemented in Scala as a foldLeft over the left operand where each element of the right operand is removed from the left collection. Let's assume that the left and right operand have m and n elements, respectively.

Calling toSet on an Array or Range object will return a HashTrieSet, which is a HashSet implementation and, thus, offers a remove operation with complexity of almost O(1). Thus, the overall complexity for the diff operation is O(m).

Considering now a different approach, we'll see that this is actually quite good. One could also solve the problem by sorting both ranges and then traversing them once in a merge-sort fashion to eliminate all elements which occur in both ranges. This will give you a complexity of O(max(m, n) * log(max(m, n))), because you have to sort both ranges.

Update

I ran some experiments to investigate whether you can speed up your computation by using mutable hash sets instead of immutable. The result as shown in the tables below is that it depends on the size ration of range and ready.

It seems as if using immutable hash sets is more efficient if the ready.size/range.size < 0.2. Above this ratio, the mutable hash sets outperform the immutable hash sets.

For my experiments, I set range = (1 to n), with n being the number of elements in range. For ready I selected a random sub set of range with the respective number of elements. I repeated each run 20 times and summed up the times calculated with System.currentTimeMillis().

range.size == 100000
+-----------+-----------+---------+
| Fraction  | Immutable | Mutable |
+-----------+-----------+---------+
| 0.01      |        28 |     111 |
| 0.02      |        23 |     124 |
| 0.05      |        39 |     115 |
| 0.1       |       113 |     129 |
| 0.2       |       174 |     140 |
| 0.5       |       472 |     200 |
| 0.75      |       722 |     203 |
| 0.9       |       786 |     202 |
| 1.0       |       743 |     212 |
+-----------+-----------+---------+

range.size == 500000
+-----------+-----------+---------+
| Fraction  | Immutable | Mutable |
+-----------+-----------+---------+
| 0.01      |        73 |     717 |
| 0.02      |       140 |     771 |
| 0.05      |       328 |     722 |
| 0.1       |       538 |     706 |
| 0.2       |      1053 |     836 |
| 0.5       |      2543 |    1149 |
| 0.75      |      3539 |    1260 |
| 0.9       |      4171 |    1305 |
| 1.0       |      4403 |    1522 |
+-----------+-----------+---------+

178

answered Oct 16 '22 22:10

Till Rohrmann

Related questions
                            
                                Filter a list by item index?
                            
                                How to tune Play Framework application with proper threadpools?
                            
                                How to use a type constraint with abstract type
                            
                                Scala Futures not running in parallel
                            
                                could not find implicit ...: akka.http.server.RoutingSetup
                            
                                Understanding `Monomorphism` Example of Shapeless
                            
                                How to use PathFilter in Apache Spark?
                            
                                SBT Scala cross versions, with aggregation and dependencies
                            
                                Why is Scala's tail recursion slower than that of Java?
                            
                                How Do Callbacks work in Non-blocking Design?
                            
                                How i can integrate Apache Spark with the Play Framework to display predictions in real time?
                            
                                Simplest method for text lemmatization in Scala and Spark
                            
                                Why does this partial application not compile?
                            
                                Mocking Scala void function using Mockito
                            
                                How to do generic tuple -> case class conversion in Scala?
                            
                                Scala's monadic chaining of Try
                            
                                Play Framework - ExecutionContext cannot be resolved when trying to map a Promise
                            
                                Processing multiple files as independent RDD's in parallel
                            
                                How to convert a map to Spark's RDD
                            
                                Use spark in a sbt project in intellij

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With