Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting in O(n) intersect

  • Let S1 and S2 be two sets of integers (they are not necessarily disjoint).
  • We know that |S1| = |S2| = n (i.e. each set has n integers).
  • Each set is stored in an array of length n, where its integers are sorted in ascending order.
  • Let k ≥ 1 be an integer.
  • Design an algorithm to find the k smallest integers in S1 ∩ S2 in O(n) time.

This is what I have so far:

  1. Create a new array called Intersection
  2. For each e in S1 add e to hashset in O(n) time
  3. For each e in S2 check if e exists in hashset in O(n) time
  4. If e exists in hashset add e to Intersection
  5. Once comparisons are done sort Intersection by count sort in O(n) time
  6. return the first k integers

Thus O(n) + O(n) + O(n) = O(n)

Am I on the right track?

like image 867
101ldaniels Avatar asked Feb 17 '26 19:02

101ldaniels


1 Answers

Yes, you're definitely on the right track but there's actually no need at all to generate a hash-table or extra set. As your two sets are already sorted, you can simply run an index/pointer through both of them, looking for the common elements.

For example, to find the first common element from the two sets, use the following pseudo-code:

start at first index of both sets
while more elements in both sets, and current values are different:
    if set1 value is less than set2 value:
        advance set1 index
    else
        advance set2 index

At the end of that, set1 index will refer to an intersect point provided that neither index has moved beyond the last element in their respective list. You can then just use that method in a loop to find the first x intersection values.

Here's a proof of concept in Python 3 that gives you the first three numbers that are in the two lists (multiples-of-two and multiples-of-three). The full intersection would be {0, 6, 12, 18, 24} but you will see that it will only extract the first three of those:

# Create the two lists to be used for intersection.

set1 = [i * 2 for i in range(15)] ; print(set1) # doubles
set2 = [i * 3 for i in range(15)] ; print(set2) # trebles

idx1 = 0 ; count1 = len(set1)
idx2 = 0 ; count2 = len(set2)

# Only want first three.

need = 3
while need > 0:
    # Continue until we find next intersect or end of a list.

    while idx1 < count1 and idx2 < count2 and set1[idx1] != set2[idx2]:
        # Advance pointer of list with lowest value.

        if set1[idx1] < set2[idx2]:
            idx1 += 1
        else:
            idx2 += 1

    # Break if reached end of a list with no intersect.

    if idx1 >= count1 or idx2 >= count2:
        break

    # Otherwise print intersect and advance to next list candidate.

    print(set1[idx1]) ; need -= 1
    idx1 += 1 ; idx2 += 1

The output is, as expected:

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42]
0
6
12

If you needed a list at the end rather than just printing out the intersect points, you would simply initialise an empty container before the loop and the append the value to it rather than printing it. This then becomes a little more like your proposed solution but with the advantage of not needing hash tables or sorting.

like image 61
paxdiablo Avatar answered Feb 20 '26 19:02

paxdiablo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!