Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala find missing values in a range

For a given range, for instance

val range = (1 to 5).toArray
val ready = Array(2,4)

the missing values (not ready) are

val missing = range.toSet diff ready.toSet
Set(5, 1, 3)

The real use case includes thousands of range instances with (possibly) thousands of missing or not ready values. Is there a more time-efficient approach in Scala?

like image 655
elm Avatar asked Sep 10 '15 08:09

elm


People also ask

How do you find the missing value of a range?

Let's start by reminding ourselves that the range of a set of data is the largest value subtract the smallest value. So if we look at our data set, ignoring 𝑘 for the time being, then the range can be found by our largest value, nine, subtract our smallest value, six, which is a range of three.


1 Answers

The diff operation is implemented in Scala as a foldLeft over the left operand where each element of the right operand is removed from the left collection. Let's assume that the left and right operand have m and n elements, respectively.

Calling toSet on an Array or Range object will return a HashTrieSet, which is a HashSet implementation and, thus, offers a remove operation with complexity of almost O(1). Thus, the overall complexity for the diff operation is O(m).

Considering now a different approach, we'll see that this is actually quite good. One could also solve the problem by sorting both ranges and then traversing them once in a merge-sort fashion to eliminate all elements which occur in both ranges. This will give you a complexity of O(max(m, n) * log(max(m, n))), because you have to sort both ranges.

Update

I ran some experiments to investigate whether you can speed up your computation by using mutable hash sets instead of immutable. The result as shown in the tables below is that it depends on the size ration of range and ready.

It seems as if using immutable hash sets is more efficient if the ready.size/range.size < 0.2. Above this ratio, the mutable hash sets outperform the immutable hash sets.

For my experiments, I set range = (1 to n), with n being the number of elements in range. For ready I selected a random sub set of range with the respective number of elements. I repeated each run 20 times and summed up the times calculated with System.currentTimeMillis().

range.size == 100000
+-----------+-----------+---------+
| Fraction  | Immutable | Mutable |
+-----------+-----------+---------+
| 0.01      |        28 |     111 |
| 0.02      |        23 |     124 |
| 0.05      |        39 |     115 |
| 0.1       |       113 |     129 |
| 0.2       |       174 |     140 |
| 0.5       |       472 |     200 |
| 0.75      |       722 |     203 |
| 0.9       |       786 |     202 |
| 1.0       |       743 |     212 |
+-----------+-----------+---------+

range.size == 500000
+-----------+-----------+---------+
| Fraction  | Immutable | Mutable |
+-----------+-----------+---------+
| 0.01      |        73 |     717 |
| 0.02      |       140 |     771 |
| 0.05      |       328 |     722 |
| 0.1       |       538 |     706 |
| 0.2       |      1053 |     836 |
| 0.5       |      2543 |    1149 |
| 0.75      |      3539 |    1260 |
| 0.9       |      4171 |    1305 |
| 1.0       |      4403 |    1522 |
+-----------+-----------+---------+
like image 178
Till Rohrmann Avatar answered Oct 16 '22 22:10

Till Rohrmann