I have x sets with y elements(unsorted integers) in each of them. I want to find maximal size of intersection between pair of this sets. For example: <blockquote> *5 sets, size = 3 set 1 : 1 2 3 set 2 : 4 2 3 set 3 : 5 6 7 set 4 : 5 8 9 set 5 : 5 10 11 </blockquote> maximal intersection have set 1 with set 2 and it's size is 2; the answer is 2. So, I can do it in O(x^2 * y) using <code>HashSets</code>, by just looking out all pairs and calculating their intersection size. But I want to do it faster. I think there are specific algorithm or data structure that can help. Can you give me some idea? UPDATE : x and y is about 10^3, elements are int. And there are no equals sets.

One optimisation that I can think of is remembering intersection size between the first set and the rest of them and then use the data to cut some cases. How you can use it: If you have sets <code>A</code>, <code>B</code>, <code>C</code> of length <code>n</code> and <pre class="prettyprint"><code>intersection(A,B) = p intersection(A,C) = q </code></pre> then <pre class="prettyprint"><code>intersection(B,C) <= n - abs(p - q) </code></pre> For sets in your case: <pre class="prettyprint"><code>S0 = { 1 2 3 } S1 = { 4 2 3 } S2 = { 5 6 7 } </code></pre> you compute <code>intersection(S0,S1) = 2</code> and remember the result: <pre class="prettyprint"><code>[ i(0,1)=2 ] </code></pre> then <code>intersection(S0,S2) = 0</code>, so <pre class="prettyprint"><code>[ i(0,1)=2; i(0,2)=0 ] </code></pre> And when you compute <code>intersection(S1,S2)</code> after comparing first elements <pre class="prettyprint"><code>(S1[0]=4 != S2[0]=5) </code></pre> you can say that <code>intersection(S1,S2) <= 2</code> that is the best result that you have so far. What can be further improvement is to remember more exact results of intersections but still not computing all of them. I'm not sure if that's the best option. Maybe there exists totally different approach to this.

Here is some psuedocode: <pre class="prettyprint"><code>function max_intersection(vector<vector<int>> sets): hashmap<int, vector<set_id>> val_map; foreach set_id:set in sets: foreach val in set: val_map[val].push_back(set_id); max_count = 0 vector<int> counts = vector<int>(size = sets.size() * sets.size(), init_value = 0); foreach val:set_ids in val_map: foreach id_1:set_id_1 in set_ids: foreach id_2:set_id_2 in set_ids where id_2 > id_1: count = ++counts[set_id_1 * sets.size() + set_id_2]; if (count > max_count): max_count = count; return max_count; </code></pre> So if <code>X</code> is the number of sets and <code>Y</code> is the number of elements in each set: <ol> <li>Inserting into <code>val_map</code> is <code>O(X*Y)</code> </li> <li>Creating <code>counts</code> and initializing each element to zero is <code>O(X^2)</code> </li> <li>If there are no intersections (each value occurs exactly once), the last loop runs in time <code>O(X*Y)</code>. However, at the other extreme, if there is a large number of intersections (all sets are equivalent), then the last loop runs in <code>O(X^2*Y)</code>.</li> </ol> So depending on the amount of intersections, the time complexity is somewhere between <code>O(X*Y + X^2)</code> and <code>O(X^2*Y)</code>.

maximal intersection between n sets

Tags:

algorithm

set

I have x sets with y elements(unsorted integers) in each of them. I want to find maximal size of intersection between pair of this sets.

For example:

*5 sets, size = 3

set 1 : 1 2 3

set 2 : 4 2 3

set 3 : 5 6 7

set 4 : 5 8 9

set 5 : 5 10 11

maximal intersection have set 1 with set 2 and it's size is 2; the answer is 2.

So, I can do it in O(x^2 * y) using HashSets, by just looking out all pairs and calculating their intersection size. But I want to do it faster. I think there are specific algorithm or data structure that can help. Can you give me some idea?

UPDATE : x and y is about 10^3, elements are int. And there are no equals sets.

461

asked Nov 05 '15 15:11

rusted

2 Answers

One optimisation that I can think of is remembering intersection size between the first set and the rest of them and then use the data to cut some cases.

How you can use it:

If you have sets A, B, C of length n and

intersection(A,B) = p
intersection(A,C) = q

then

intersection(B,C) <= n - abs(p - q)

For sets in your case:

S0 = { 1 2 3 }
S1 = { 4 2 3 }
S2 = { 5 6 7 }

you compute intersection(S0,S1) = 2 and remember the result:

[ i(0,1)=2 ]

then intersection(S0,S2) = 0, so

[ i(0,1)=2; i(0,2)=0 ]

And when you compute intersection(S1,S2) after comparing first elements

(S1[0]=4 != S2[0]=5)

you can say that intersection(S1,S2) <= 2 that is the best result that you have so far.

What can be further improvement is to remember more exact results of intersections but still not computing all of them.

I'm not sure if that's the best option. Maybe there exists totally different approach to this.

answered Sep 30 '22 12:09

kostek

Here is some psuedocode:

function max_intersection(vector<vector<int>> sets):
    hashmap<int, vector<set_id>> val_map;
    foreach set_id:set in sets:
        foreach val in set:
            val_map[val].push_back(set_id);
    max_count = 0
    vector<int> counts = vector<int>(size = sets.size() * sets.size(), init_value = 0);
    foreach val:set_ids in val_map:
        foreach id_1:set_id_1 in set_ids:
            foreach id_2:set_id_2 in set_ids where id_2 > id_1:
                count = ++counts[set_id_1 * sets.size() + set_id_2];
                if (count > max_count):
                    max_count = count;
    return max_count;

So if X is the number of sets and Y is the number of elements in each set:

Inserting into val_map is O(X*Y)
Creating counts and initializing each element to zero is O(X^2)
If there are no intersections (each value occurs exactly once), the last loop runs in time O(X*Y). However, at the other extreme, if there is a large number of intersections (all sets are equivalent), then the last loop runs in O(X^2*Y).

So depending on the amount of intersections, the time complexity is somewhere between O(X*Y + X^2) and O(X^2*Y).

answered Sep 30 '22 12:09

Joel

Related questions
                            
                                Algorithm for finding a point in an irregular polygon
                            
                                I'm trying to find a "bartender algorithm"
                            
                                Test point for its position relative to the convex hull in log(n)
                            
                                Is it possible to compute the minimum of a set of numbers modulo a given number in amortized sublinear time?
                            
                                Find the largest sum subarray from the given array using segment trees
                            
                                maximum bipartite matching (ford-fulkerson)
                            
                                How to sort a collection of points so that they set up one after another?
                            
                                Average face - algorithm
                            
                                Find all subarrays of fixed length with a given ranking
                            
                                How to build a Plinko board of words from a dictionary better than brute force?
                            
                                For a given binary tree find maximum binary search sub-tree
                            
                                Is there an algorithm to generate all unique circular permutations of a multiset?
                            
                                Minimize the sum of errors of representative integers
                            
                                Find words and combinations of words that can be spoken the quickest
                            
                                Square Subsequence
                            
                                Concurrent mutable priority queue
                            
                                Four-way navigation algorithm
                            
                                number to unique permutation mapping of a sequence containing duplicates
                            
                                Shifting/aligning/rotating a circular buffer to zero in-place
                            
                                Draw Quadratic Curve on GPU

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With