Spark: FlatMapValues query

Tags:

apache-spark

flatmap

I'm reading the Learning Spark book and couldn't understand the following pair rdd transformation.

rdd.flatMapValues(x => (x to 5))

It is applied on an rdd {(1,2),(3,4),(3,6)} and the output of the transformation is {(1,2),(1,3),(1,4),(1,5),(3,4),(3,5)}

Can someone please explain this.

211

asked May 18 '16 14:05

Vinay

1 Answers

flatMapValues method is a combination of flatMap and mapValues.

Let's start with the given rdd.

val sampleRDD = sc.parallelize(Array((1,2),(3,4),(3,6)))

mapValues maps the values while keeping the keys.

For example, sampleRDD.mapValues(x => x to 5) returns

Array((1,Range(2, 3, 4, 5)), (3,Range(4, 5)), (3,Range()))

notice that for key-value pair (3, 6), it produces (3,Range()) since 6 to 5 produces an empty collection of values.

flatMap "breaks down" collections into the elements of the collection. You can search for more accurate description of flatMap online like here and here.

For example,

given val rdd2 = sampleRDD.mapValues(x => x to 5), if we do rdd2.flatMap(x => x), you will get

Array((1,2),(1,3),(1,4),(1,5),(3,4),(3,5)).

That is, for every element in the collection in each key, we create a (key, element) pair.

Also notice that (3, Range()) does not produce any additional key element pair since the sequence is empty.

now combining flatMap and mapValues, you get flatMapValues.

answered Oct 09 '22 07:10

jtitusj

Related questions
                            
                                Apache Spark upgrade from 1.5.2 to 1.6.0 using homebrew leading to permission denied error during execution
                            
                                Multiple SparkContext detected in the same JVM
                            
                                How can I sum multiple columns in a spark dataframe in pyspark?
                            
                                How to set column names to toDF() function in spark dataframe using a string array?
                            
                                Creating a row number of each row in PySpark DataFrame using row_number() function with Spark version 2.2
                            
                                What is the Scala type mapping for all Spark SQL DataType
                            
                                Spark job in Java: how to access files from 'resources' when run on a cluster
                            
                                How to copy and convert parquet files to csv
                            
                                Create array of literals and columns from List of Strings in Spark SQL
                            
                                How to convert Row to json in Spark 2 Scala
                            
                                Compare in-memory cluster computing systems
                            
                                In Spark Dataframe how to get duplicate records and distinct records in two dataframes?
                            
                                Find out the partition no/id
                            
                                Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers
                            
                                How can I create a Spark DataFrame from a nested array of struct element?
                            
                                How to lower the case of column names of a data frame but not its values?
                            
                                Spark: Trying to run spark-shell, but get 'cmd' is not recognized as an internal or
                            
                                How to convert the datasets of Spark Row into string?
                            
                                Converting JavaRDD to DataFrame in Spark java
                            
                                sbt got error when run Spark hello world code?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With