Let me help to clarify about shuffle in depth and how Spark uses shuffle managers. I report some very helpful resources: https://trongkhoanguyenblog.wordpress.com/ https://0x0fff.com/spark-architecture-shuffle/ https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/4-shuffleDetails.md Reading them, I understood there are different shuffle managers. I want to focus about two of them: <code>hash manager</code> and <code>sort manager</code>(which is the default manager). For expose my question, I want to start from a very common transformation: <pre class="prettyprint"><code>val rdd = reduceByKey(_ + _) </code></pre> This transformation causes map-side aggregation and then shuffle for bringing all the same keys into the same partition. My questions are: <ul> <li> Is Map-Side aggregation implemented using internally a mapPartition transformation and thus aggregating all the same keys using the combiner function or is it implemented with a <code>AppendOnlyMap</code> or <code>ExternalAppendOnlyMap</code>? </li> <li> If <code>AppendOnlyMap</code> or <code>ExternalAppendOnlyMap</code> maps are used for aggregating, are they used also for reduce side aggregation that happens into the <code>ResultTask</code>? </li> <li> What exaclty the purpose about these two kind of maps (<code>AppendOnlyMap</code> or <code>ExternalAppendOnlyMap</code>)? </li> <li> Are <code>AppendOnlyMap</code> or <code>ExternalAppendOnlyMap</code> used from all shuffle managers or just from the sortManager? </li> <li> I read that after <code>AppendOnlyMap</code> or <code>ExternalAppendOnlyMap</code> are full, are spilled into a file, how exactly does this steps happen? </li> <li> Using the Sort shuffle manager, we use an appendOnlyMap for aggregating and combine partition records, right? Then when execution memory is fill up, we start sorting map, spilling it to disk and then clean up the map, my question is : what is the difference between spill to disk and shuffle write? They consist basically in creating file on local file system, but they are treat differently, Shuffle write records, are not put into the appendOnlyMap. </li> <li> Can you explain in depth what happen when reduceByKey being executed, explaining me all the steps involved for to accomplish that? Like for example all the steps for map side aggregation, shuffling and so on. </li> </ul>

It follows the description of <code>reduceByKey</code> step-by-step: <ol> <li> <code>reduceByKey</code> calls <code>combineByKeyWithTag</code>, with identity combiner and identical merge value and create value</li> <li> <code>combineByKeyWithClassTag</code> creates an <code>Aggregator</code> and returns <code>ShuffledRDD</code>. Both "map" and "reduce" side aggregations use internal mechanism and don't utilize <code>mapPartitions</code>.</li> <li> <code>Agregator</code> uses <code>ExternalAppendOnlyMap</code> for both <code>combineValuesByKey</code> ("map side reduction") and <code>combineCombinersByKey</code> ("reduce side reduction")</li> <li>Both methods use <code>ExternalAppendOnlyMap.insertAllMethod</code> </li> <li> <code>ExternalAppendOnlyMap</code> keeps track of spilled parts and the current in-memory map (<code>SizeTrackingAppendOnlyMap</code>)</li> <li> <code>insertAll</code> method updates in-memory map and checks on insert if size estimated size of the current map exceeds the threshold. It uses inherited <code>Spillable.maybeSpill</code> method. If threshold is exceeded this method calls <code>spill</code> as a side effect, and <code>insertAll</code> initializes clean <code>SizeTrackingAppendOnlyMap</code> </li> <li> <code>spill</code> calls <code>spillMemoryIteratorToDisk</code> which gets <code>DiskBlockObjectWriter</code> object from the block manager.</li> </ol> <code>insertAll</code> steps are applied for both map and reduce side aggregations with corresponding <code>Aggregator</code> functions with shuffle stage in between. As of Spark 2.0 there is only sort based manager: SPARK-14667

Understanding shuffle managers in Spark

Tags:

shuffle

apache-spark

rdd

partitioning

Let me help to clarify about shuffle in depth and how Spark uses shuffle managers. I report some very helpful resources:

https://trongkhoanguyenblog.wordpress.com/

https://0x0fff.com/spark-architecture-shuffle/

https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/4-shuffleDetails.md

Reading them, I understood there are different shuffle managers. I want to focus about two of them: hash manager and sort manager(which is the default manager).

For expose my question, I want to start from a very common transformation:

val rdd = reduceByKey(_ + _)

This transformation causes map-side aggregation and then shuffle for bringing all the same keys into the same partition.

My questions are:

Is Map-Side aggregation implemented using internally a mapPartition transformation and thus aggregating all the same keys using the combiner function or is it implemented with a AppendOnlyMap or ExternalAppendOnlyMap?
If AppendOnlyMap or ExternalAppendOnlyMap maps are used for aggregating, are they used also for reduce side aggregation that happens into the ResultTask?
What exaclty the purpose about these two kind of maps (AppendOnlyMap or ExternalAppendOnlyMap)?
Are AppendOnlyMap or ExternalAppendOnlyMap used from all shuffle managers or just from the sortManager?
I read that after AppendOnlyMap or ExternalAppendOnlyMap are full, are spilled into a file, how exactly does this steps happen?
Using the Sort shuffle manager, we use an appendOnlyMap for aggregating and combine partition records, right? Then when execution memory is fill up, we start sorting map, spilling it to disk and then clean up the map, my question is : what is the difference between spill to disk and shuffle write? They consist basically in creating file on local file system, but they are treat differently, Shuffle write records, are not put into the appendOnlyMap.
Can you explain in depth what happen when reduceByKey being executed, explaining me all the steps involved for to accomplish that? Like for example all the steps for map side aggregation, shuffling and so on.

462

asked Jan 11 '17 08:01

Giorgio

1 Answers

It follows the description of reduceByKey step-by-step:

reduceByKey calls combineByKeyWithTag, with identity combiner and identical merge value and create value
combineByKeyWithClassTag creates an Aggregator and returns ShuffledRDD. Both "map" and "reduce" side aggregations use internal mechanism and don't utilize mapPartitions.
Agregator uses ExternalAppendOnlyMap for both combineValuesByKey ("map side reduction") and combineCombinersByKey ("reduce side reduction")
Both methods use ExternalAppendOnlyMap.insertAllMethod
ExternalAppendOnlyMap keeps track of spilled parts and the current in-memory map (SizeTrackingAppendOnlyMap)
insertAll method updates in-memory map and checks on insert if size estimated size of the current map exceeds the threshold. It uses inherited Spillable.maybeSpill method. If threshold is exceeded this method calls spill as a side effect, and insertAll initializes clean SizeTrackingAppendOnlyMap
spill calls spillMemoryIteratorToDisk which gets DiskBlockObjectWriter object from the block manager.

insertAll steps are applied for both map and reduce side aggregations with corresponding Aggregator functions with shuffle stage in between.

As of Spark 2.0 there is only sort based manager: SPARK-14667

126

answered Sep 22 '22 09:09

user7337271

Related questions
                            
                                How to max value and keep all columns (for max records per group)? [duplicate]
                            
                                Set hadoop configuration values on spark-submit command line
                            
                                spark + sbt-assembly: "deduplicate: different file contents found in the following"
                            
                                Spark Dataset select with typedcolumn
                            
                                When are cache and persist executed (since they don't seem like actions)?
                            
                                How to open/stream .zip files through Spark?
                            
                                How to measure the execution time of a query on Spark
                            
                                Apache-Spark : What is map(_._2) shorthand for?
                            
                                scala - Spark : How to union all dataframe in loop
                            
                                Spark MLlib - trainImplicit warning
                            
                                Java heap space OutOfMemoryError in pyspark spark-submit?
                            
                                BigQuery replaced most of my Spark jobs, am I missing something?
                            
                                WARN BlockManagerMasterEndpoint: No more replicas available for rdd
                            
                                Manually calling spark's garbage collection from pyspark
                            
                                javax.servlet.ServletException: java.util.NoSuchElementException: None.get
                            
                                Spark: How to join RDDs by time range
                            
                                Spark executor logs on YARN
                            
                                Spark: Read an inputStream instead of File
                            
                                UnresolvedException: Invalid call to dataType on unresolved object when using DataSet constructed from Seq.empty (since Spark 2.3.0)
                            
                                Co-partitioned joins in spark SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With