I have x sets with y elements(unsorted integers) in each of them. I want to find maximal size of intersection between pair of this sets.
For example:
*5 sets, size = 3
set 1 : 1 2 3
set 2 : 4 2 3
set 3 : 5 6 7
set 4 : 5 8 9
set 5 : 5 10 11
maximal intersection have set 1 with set 2 and it's size is 2; the answer is 2.
So, I can do it in O(x^2 * y) using HashSets
, by just looking out all pairs and calculating their intersection size. But I want to do it faster. I think there are specific algorithm or data structure that can help. Can you give me some idea?
UPDATE : x and y is about 10^3, elements are int. And there are no equals sets.
Naive Approach: The simplest approach to solve the given problem is to iterate over all segments and for each segment count the number of intersections by checking it with all other segments and then print the maximum among all the count of intersections obtained.
Two lines in 2D either intersect once or they are parallel. If no lines are parallel, there are n*(n-1)/2 intersections.
One optimisation that I can think of is remembering intersection size between the first set and the rest of them and then use the data to cut some cases.
How you can use it:
If you have sets A
, B
, C
of length n
and
intersection(A,B) = p
intersection(A,C) = q
then
intersection(B,C) <= n - abs(p - q)
For sets in your case:
S0 = { 1 2 3 }
S1 = { 4 2 3 }
S2 = { 5 6 7 }
you compute intersection(S0,S1) = 2
and remember the result:
[ i(0,1)=2 ]
then intersection(S0,S2) = 0
, so
[ i(0,1)=2; i(0,2)=0 ]
And when you compute intersection(S1,S2)
after comparing first elements
(S1[0]=4 != S2[0]=5)
you can say that intersection(S1,S2) <= 2
that is the best result that you have so far.
What can be further improvement is to remember more exact results of intersections but still not computing all of them.
I'm not sure if that's the best option. Maybe there exists totally different approach to this.
Here is some psuedocode:
function max_intersection(vector<vector<int>> sets):
hashmap<int, vector<set_id>> val_map;
foreach set_id:set in sets:
foreach val in set:
val_map[val].push_back(set_id);
max_count = 0
vector<int> counts = vector<int>(size = sets.size() * sets.size(), init_value = 0);
foreach val:set_ids in val_map:
foreach id_1:set_id_1 in set_ids:
foreach id_2:set_id_2 in set_ids where id_2 > id_1:
count = ++counts[set_id_1 * sets.size() + set_id_2];
if (count > max_count):
max_count = count;
return max_count;
So if X
is the number of sets and Y
is the number of elements in each set:
val_map
is O(X*Y)
counts
and initializing each element to zero is O(X^2)
O(X*Y)
. However, at the other extreme, if there is a large number of intersections (all sets are equivalent), then the last loop runs in O(X^2*Y)
.So depending on the amount of intersections, the time complexity is somewhere between O(X*Y + X^2)
and O(X^2*Y)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With