In Scala I can flatten a collection using : <pre class="prettyprint"><code>val array = Array(List("1,2,3").iterator,List("1,4,5").iterator) //> array : Array[Iterator[String]] = Array(non-empty iterator, non-empty itera //| tor) array.toList.flatten //> res0: List[String] = List(1,2,3, 1,4,5) </code></pre> But how can I perform similar in Spark ? Reading the API doc http://spark.apache.org/docs/0.7.3/api/core/index.html#spark.RDD there does not seem to be a method which provides this functionality ?

Use <code>flatMap</code> and the <code>identity</code> <code>Predef</code>, this is more readable than using <code>x => x</code>, e.g. <pre class="prettyprint"><code>myRdd.flatMap(identity) </code></pre>

Try flatMap with an identity map function (<code>y => y</code>): <pre class="prettyprint"><code>scala> val x = sc.parallelize(List(List("a"), List("b"), List("c", "d"))) x: org.apache.spark.rdd.RDD[List[String]] = ParallelCollectionRDD[1] at parallelize at <console>:12 scala> x.collect() res0: Array[List[String]] = Array(List(a), List(b), List(c, d)) scala> x.flatMap(y => y) res3: org.apache.spark.rdd.RDD[String] = FlatMappedRDD[3] at flatMap at <console>:15 scala> x.flatMap(y => y).collect() res4: Array[String] = Array(a, b, c, d) </code></pre>

How to flatten a collection with Spark/Scala?

Tags:

In Scala I can flatten a collection using :

val array = Array(List("1,2,3").iterator,List("1,4,5").iterator)                                                   //> array  : Array[Iterator[String]] = Array(non-empty iterator, non-empty itera                                                   //| tor)       array.toList.flatten                      //> res0: List[String] = List(1,2,3, 1,4,5)

But how can I perform similar in Spark ?

Reading the API doc http://spark.apache.org/docs/0.7.3/api/core/index.html#spark.RDD there does not seem to be a method which provides this functionality ?

251

asked Apr 17 '14 16:04

blue-sky

2 Answers

Use flatMap and the identity Predef, this is more readable than using x => x, e.g.

myRdd.flatMap(identity)

119

answered Sep 20 '22 08:09

samthebest

Try flatMap with an identity map function (y => y):

scala> val x = sc.parallelize(List(List("a"), List("b"), List("c", "d"))) x: org.apache.spark.rdd.RDD[List[String]] = ParallelCollectionRDD[1] at parallelize at <console>:12  scala> x.collect() res0: Array[List[String]] = Array(List(a), List(b), List(c, d))  scala> x.flatMap(y => y) res3: org.apache.spark.rdd.RDD[String] = FlatMappedRDD[3] at flatMap at <console>:15  scala> x.flatMap(y => y).collect() res4: Array[String] = Array(a, b, c, d)

answered Sep 22 '22 08:09

Josh Rosen

Related questions
                            
                                In Symfony2, why is it a bad idea to inject the service container, rather than individual services?
                            
                                ArrayList vs the List returned by Arrays.asList() [duplicate]
                            
                                Absolute value of a number
                            
                                Python: Pandas - Delete the first row by group
                            
                                Swift Switch case: Default will never be executed warning
                            
                                Why does next_permutation skip some permutations?
                            
                                SFSafariViewController crash: The specified URL has an unsupported scheme
                            
                                Swashbuckle Swagger - How to annotate content types?
                            
                                Sum operation on PySpark DataFrame giving TypeError when type is fine
                            
                                NodeJS - nodemon not restarting my server
                            
                                Difference in Azure Availability Sets and Scale Sets
                            
                                Accessing firebase.storage() with AngularFire2 (Angular2 rc.5)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With