I'm trying to come up with a fast algorithm for, given any array of length n, obtaining the largest subarray of distinct values.
For example, the largest subarray of distinct values of
[1, 4, 3, 2, 4, 2, 8, 1, 9]
would be
[4, 2, 8, 1, 9]
This is my current solution, I think it runs in O(n^2). This is because check_dups runs in linear time, and it is called every time j or i increments.
arr = [0,...,n]
i = 0
j = 1
i_best = i
j_best = j
while i < n-1 and j < n:
if check_dups(arr, i j): //determines if there's duplicates in the subarray i,j in linear time
i += 1
else:
if j - i > j_best - i_best:
i_best = i
j_best = j
j += 1
return subarray(arr, i_best, j_best)
Does anyone have a better solution, in linear time?
Please note this is pseudocode and I'm not looking for an answer that relies on specific existing functions of a defined language (such as arr.contains()). Thanks!
Consider the problem of finding the largest distinct-valued subarray ending at a particular index j. Conceptually this is straightforward: starting at arr[j], you go backwards and include all elements until you find a duplicate.
Let's use this intuition to solve this problem for all j from 0 up to length(arr). We need to know, at any point in the iteration, how far back we can go before we find a duplicate. That is, we need to know the least i such that subarray(arr, i, j) contains distinct values. (I'm assuming subarray treats the indices as inclusive.)
If we knew i at some point in the iteration (say, when j = k), can we quickly update i when j = k+1? Indeed, if we knew when was the last occurrence of arr[k+1], then we can update i := max(i, lastOccurrence(arr[k+1]) + 1). We can compute lastOccurrence in O(1) time with a HashMap.
Pseudocode:
arr = ... (from input)
map = empty HashMap
i = 0
i_best = 0
j_best = 0
for j from 0 to length(arr) - 1 inclusive:
if map contains-key arr[j]:
i = max(i, map[arr[j]] + 1)
map[arr[j]] = j
if j - i > j_best - i_best:
i_best = i
j_best = j
return subarray(arr, i_best, j_best)
We can adapt pkpnd's algorithm to use an array rather than hash map for an O(n log n) solution or potentially O(n) if your data allows for an O(n) stable sort, but you'd need to implement a stable sorting function that also provides the original indexes of the elements.
1 4 3 2 4 2 8 1 9
0 1 2 3 4 5 6 7 8 (indexes)
Sorted:
1 1 2 2 3 4 4 8 9
0 7 3 5 2 1 4 6 8 (indexes)
--- --- ---
Now, instead of a hash map, build a new array by iterating over the sorted array and inserting the last occurrence of each element according to the duplicate index arrangements. The final array would look like:
1 4 3 2 4 2 8 1 9
-1 -1 -1 -1 1 3 -1 0 -1 (previous occurrence)
We're now ready to run pkpnd's algorithm with a slight modification:
arr = ... (from input)
map = previous occurrence array
i = 0
i_best = 0
j_best = 0
for j from 0 to length(arr) - 1 inclusive:
if map[j] >= 0:
i = max(i, map[j] + 1)
if j - i > j_best - i_best:
i_best = i
j_best = j
return subarray(arr, i_best, j_best)
JavaScript code:
function f(arr, map){
let i = 0
let i_best = 0
let j_best = 0
for (j=0; j<arr.length; j++){
if (map[j] >= 0)
i = Math.max(i, map[j] + 1)
if (j - i > j_best - i_best){
i_best = i
j_best = j
}
}
return [i_best, j_best]
}
let arr = [ 1, 4, 3, 2, 4, 2, 8, 1, 9]
let map = [-1,-1,-1,-1, 1, 3,-1, 0,-1]
console.log(f(arr, map))
arr = [ 1, 2, 2, 2, 2, 2, 1]
map = [-1,-1, 1, 2, 3, 4, 0]
console.log(f(arr, map))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With