algorithm: is there a map-reduce way to merge a group of sets by deleting all the subsets

Question

The problem is: Suppose we have a group of Sets: Set(1,2,3) Set(1,2,3,4) Set(4,5,6) Set(1,2,3,4,6), we need to delete all the subsets and finally get the Result: Set(4,5,6) Set(1,2,3,4,6). (Since both Set(1,2,3) and Set(1,2,3,4) are the subsets of Set(1,2,3,4,6), both are removed.)

And suppose that the elements of the set have order, which can be Int, Char, etc.

Is it possible to do it in a map-reduce way?

The reason to do it in a map-reduce way is that sometimes the group of Sets has a very large size, which makes it not possible to do it in the memory of a single machine. So we hope to do it in a map-reduce way, it may be not very efficient, but just work.

My problem is:

I don't know how to define a key for the key-value pair in the map-reduce process to group Sets properly.
I don't know when the process should be finished, that all the subsets have been removed.

EDIT:

The size of the data will keep growing larger in the future.
The input can be either a group of sets or multiple lines with each line containing a group of sets. Currently the input is val data = RDD[Set], I firstly do data.collect(), which results in an overall group of sets. But I can modify the generation of the input into a RDD[Array[Set]], which will give me multiple lines with each line containing a group of sets.
The elements in each set can be sorted by modifying other parts of the program.

user1952500 · Accepted Answer

I doubt this can be done by a traditional map-reduce technique which is essentially a divide-and-conquer method. This is because:

in this problem each set has to essentially be compared to all of the sets of larger cardinality whose min and max elements lie around the min and max of the smaller set.
unlike sorting and other problems amenable to map-reduce, we don't have a transitivity relation, i.e., if A is not-a-subset-of B and B is-not-subset-of C, we cannot make any statement about A w.r.t. C.

Based on the above observations this problem seems to be similar to duplicate detection and there is research on duplicate detection, for example here. Similar techniques will work well for the current problem.

algorithm: is there a map-reduce way to merge a group of sets by deleting all the subsets

Tags:

algorithm

mapreduce

宇宙人

1 Answers

user1952500

Recent Activity

Donate For Us

algorithm: is there a map-reduce way to merge a group of sets by deleting all the subsets

Tags:

algorithm

mapreduce

宇宙人

1 Answers

user1952500

Related questions

Recent Activity

Donate For Us