Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

maximal intersection between n sets

Tags:

algorithm

set

I have x sets with y elements(unsorted integers) in each of them. I want to find maximal size of intersection between pair of this sets.

For example:

*5 sets, size = 3

set 1 : 1 2 3

set 2 : 4 2 3

set 3 : 5 6 7

set 4 : 5 8 9

set 5 : 5 10 11

maximal intersection have set 1 with set 2 and it's size is 2; the answer is 2.

So, I can do it in O(x^2 * y) using HashSets, by just looking out all pairs and calculating their intersection size. But I want to do it faster. I think there are specific algorithm or data structure that can help. Can you give me some idea?

UPDATE : x and y is about 10^3, elements are int. And there are no equals sets.

like image 461
rusted Avatar asked Nov 05 '15 15:11

rusted


People also ask

How to find the maximum number of intersections?

Naive Approach: The simplest approach to solve the given problem is to iterate over all segments and for each segment count the number of intersections by checking it with all other segments and then print the maximum among all the count of intersections obtained.

How do you find the number of intersections?

Two lines in 2D either intersect once or they are parallel. If no lines are parallel, there are n*(n-1)/2 intersections.


2 Answers

One optimisation that I can think of is remembering intersection size between the first set and the rest of them and then use the data to cut some cases.

How you can use it:

If you have sets A, B, C of length n and

intersection(A,B) = p
intersection(A,C) = q

then

intersection(B,C) <= n - abs(p - q)

For sets in your case:

S0 = { 1 2 3 }
S1 = { 4 2 3 }
S2 = { 5 6 7 }

you compute intersection(S0,S1) = 2 and remember the result:

[ i(0,1)=2 ]

then intersection(S0,S2) = 0, so

[ i(0,1)=2; i(0,2)=0 ]

And when you compute intersection(S1,S2) after comparing first elements

(S1[0]=4 != S2[0]=5)

you can say that intersection(S1,S2) <= 2 that is the best result that you have so far.

What can be further improvement is to remember more exact results of intersections but still not computing all of them.

I'm not sure if that's the best option. Maybe there exists totally different approach to this.

like image 74
kostek Avatar answered Sep 30 '22 12:09

kostek


Here is some psuedocode:

function max_intersection(vector<vector<int>> sets):
    hashmap<int, vector<set_id>> val_map;
    foreach set_id:set in sets:
        foreach val in set:
            val_map[val].push_back(set_id);
    max_count = 0
    vector<int> counts = vector<int>(size = sets.size() * sets.size(), init_value = 0);
    foreach val:set_ids in val_map:
        foreach id_1:set_id_1 in set_ids:
            foreach id_2:set_id_2 in set_ids where id_2 > id_1:
                count = ++counts[set_id_1 * sets.size() + set_id_2];
                if (count > max_count):
                    max_count = count;
    return max_count;

So if X is the number of sets and Y is the number of elements in each set:

  1. Inserting into val_map is O(X*Y)
  2. Creating counts and initializing each element to zero is O(X^2)
  3. If there are no intersections (each value occurs exactly once), the last loop runs in time O(X*Y). However, at the other extreme, if there is a large number of intersections (all sets are equivalent), then the last loop runs in O(X^2*Y).

So depending on the amount of intersections, the time complexity is somewhere between O(X*Y + X^2) and O(X^2*Y).

like image 29
Joel Avatar answered Sep 30 '22 12:09

Joel