Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the product of two RDDs?

Am new to spark. I have two RDD's and want to generate resulted RDD on them as below.

val rdd1 =  Array(1, 2)
val rdd2 =  Array(a, b, c)

val resultRDD = [(1,a), (1,b), (1,c), (2,a), (2,b), (2,c)]

Can anyone help me on what transformations or actions I need to use to generate resultRDD like above. FYI, I am writing in scala.

EDIT

Thanks. spark cartesian works for me as below.

    val data = Array('a', 'b')
    val rdd1 = sc.parallelize(data)

    val data2 = Array(1, 2, 3)
    val rdd2 = sc.parallelize(data2)

    rdd1.cartesian(rdd2).foreach(println)
like image 250
Pand005 Avatar asked Nov 13 '14 06:11

Pand005


People also ask

How many RDDs are created?

The input RDD does not get changed, because RDDs are immutable in nature but it produces one or more RDD by applying operations.

Can we create another RDD from one RDD?

Creating from another RDDYou can use transformations like map, flatmap, filter to create a new RDD from an existing one. Above, creates a new RDD “rdd3” by adding 100 to each record on RDD.

How are RDDs created?

RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing Scala collection in the driver program, and transforming it. Users may also ask Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations.


1 Answers

def cartesian[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(T, U)]

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in this and b is in other.

Doc here

like image 110
The Archetypal Paul Avatar answered Sep 19 '22 00:09

The Archetypal Paul