For a given range, for instance
val range = (1 to 5).toArray
val ready = Array(2,4)
the missing values (not ready) are
val missing = range.toSet diff ready.toSet
Set(5, 1, 3)
The real use case includes thousands of range instances with (possibly) thousands of missing or not ready values. Is there a more time-efficient approach in Scala?
Let's start by reminding ourselves that the range of a set of data is the largest value subtract the smallest value. So if we look at our data set, ignoring 𝑘 for the time being, then the range can be found by our largest value, nine, subtract our smallest value, six, which is a range of three.
The diff
operation is implemented in Scala as a foldLeft
over the left operand where each element of the right operand is removed from the left collection. Let's assume that the left and right operand have m
and n
elements, respectively.
Calling toSet
on an Array
or Range
object will return a HashTrieSet
, which is a HashSet
implementation and, thus, offers a remove operation with complexity of almost O(1)
. Thus, the overall complexity for the diff
operation is O(m)
.
Considering now a different approach, we'll see that this is actually quite good. One could also solve the problem by sorting both ranges and then traversing them once in a merge-sort fashion to eliminate all elements which occur in both ranges. This will give you a complexity of O(max(m, n) * log(max(m, n)))
, because you have to sort both ranges.
I ran some experiments to investigate whether you can speed up your computation by using mutable hash sets instead of immutable. The result as shown in the tables below is that it depends on the size ration of range
and ready
.
It seems as if using immutable hash sets is more efficient if the ready.size/range.size < 0.2
. Above this ratio, the mutable hash sets outperform the immutable hash sets.
For my experiments, I set range = (1 to n)
, with n
being the number of elements in range
. For ready
I selected a random sub set of range
with the respective number of elements. I repeated each run 20 times and summed up the times calculated with System.currentTimeMillis()
.
range.size == 100000
+-----------+-----------+---------+
| Fraction | Immutable | Mutable |
+-----------+-----------+---------+
| 0.01 | 28 | 111 |
| 0.02 | 23 | 124 |
| 0.05 | 39 | 115 |
| 0.1 | 113 | 129 |
| 0.2 | 174 | 140 |
| 0.5 | 472 | 200 |
| 0.75 | 722 | 203 |
| 0.9 | 786 | 202 |
| 1.0 | 743 | 212 |
+-----------+-----------+---------+
range.size == 500000
+-----------+-----------+---------+
| Fraction | Immutable | Mutable |
+-----------+-----------+---------+
| 0.01 | 73 | 717 |
| 0.02 | 140 | 771 |
| 0.05 | 328 | 722 |
| 0.1 | 538 | 706 |
| 0.2 | 1053 | 836 |
| 0.5 | 2543 | 1149 |
| 0.75 | 3539 | 1260 |
| 0.9 | 4171 | 1305 |
| 1.0 | 4403 | 1522 |
+-----------+-----------+---------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With