Spark: produce RDD[(X, X)] of all possible combinations from RDD[X]

Tags:

Is it possible in Spark to implement '.combinations' function from scala collections?

   /** Iterates over combinations.    *    *  @return   An Iterator which traverses the possible n-element combinations of this $coll.    *  @example  `"abbbc".combinations(2) = Iterator(ab, ac, bb, bc)`    */

For example how can I get from RDD[X] to RDD[List[X]] or RDD[(X,X)] for combinations of size = 2. And lets assume that all values in RDD are unique.

484

asked Oct 24 '14 23:10

Eugene Zhulenev

1 Answers

Cartesian product and combinations are two different things, the cartesian product will create an RDD of size rdd.size() ^ 2 and combinations will create an RDD of size rdd.size() choose 2

val rdd = sc.parallelize(1 to 5) val combinations = rdd.cartesian(rdd).filter{ case (a,b) => a < b }`. combinations.collect()

Note this will only work if an ordering is defined on the elements of the list, since we use <. This one only works for choosing two but can easily be extended by making sure the relationship a < b for all a and b in the sequence

answered Sep 20 '22 06:09

aaronman

Related questions
                            
                                WatchKit SDK not retrieving data from NSUserDefaults
                            
                                What's inside a Docker image/container?
                            
                                iccp:Not recognizing known sRGB profile that has been edited
                            
                                How to access tomcat running in docker container from browser?
                            
                                How to rebuild all library after upgrading Go 1.4
                            
                                Android scenario where ondestroy() is called without onpause() or onstop()
                            
                                Is it possible to include multiple file types when using the `FileType` event?
                            
                                Cocoapods OpenCV 2.4.10 Linker Error
                            
                                How to configure Automapper to automatically ignore properties with ReadOnly attribute?
                            
                                Difference between gcc compile options std=c++1y and std=c++14
                            
                                Print HTML table background color with Bootstrap
                            
                                Drag and Drop between two RecyclerView

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With