I have two input arrays X and Y. I want to return that element of array X which occurs with highest frequency in array Y.
The naive way of doing this requires that for each element x of array X, I linearly search array Y for its number of occurrences and then return that element x which has highest frequency. Here is the pseudo algorithm:
max_frequency = 0
max_x = -1 // -1 indicates no element found
For each x in X
frequency = 0
For each y in Y
if y == x
frequency++
End For
If frequency > max_frequency
max_frequency = frequency
max_x = x
End If
End For
return max_x
As there are two nested loops, time complexity for this algorithm would be O(n^2). Can I do this in O(nlogn) or faster ?
Use a hash table mapping keys to counts. For each element in the array, do like counts[element] = counts[element] + 1
or your language's equivalent.
At the end, loop through the mappings in the hash table and find the max.
Alternatively, if you can have additional data structures, you walk the array Y, for each number updating its frequency in a hash table. This takes O(N(Y)
time. Then walk X finding which element in X has highest frequency. This takes O(N(X))
time. Overall: linear time, and since you have to look at each element of both X
and Y
in any implementation at least once (EDIT: This is not strictly speaking true in all cases/all implementations, as jwpat7 points out, though it true in the worst case), you can't do it any faster than that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With