I'm getting confused about spill to disk
and shuffle write
. Using the default Sort shuffle manager, we use an appendOnlyMap
for aggregating and combine partition records, right? Then when execution memory fill up, we start sorting map, spilling it to disk and then clean up the map for the next spill(if occur), my questions are :
What is the difference between spill to disk and shuffle write? They consist basically in creating file on local file system and also record.
Admit are different, so Spill records are sorted because the are passed through the map, instead shuffle write records no because they don't pass from the map.
Thanks.
Giorgio
Spark gathers the required data from each partition and combines it into a new partition. During a shuffle, data is written to disk and transferred across the network.
Shuffling means the reallocation of data between multiple Spark stages. "Shuffle Write" is the sum of all written serialized data on all executors before transmitting (normally at the end of a stage) and "Shuffle Read" means the sum of read serialized data on all executors at the beginning of a stage.
Spark's operators spill data to disk if it does not fit in memory, allowing it to run well on any sized data. Likewise, cached datasets that do not fit in memory are either spilled to disk or recomputed on the fly when needed, as determined by the RDD's storage level.
spill to disk
and shuffle write
are two different things
spill to disk
- Data move from Host RAM to Host Disk - is used when there is no enough RAM on your machine, and it place part of its RAM into disk
http://spark.apache.org/faq.html
Does my data need to fit in memory to use Spark?
No. Spark's operators spill data to disk if it does not fit in memory, allowing it to run well on any sized data. Likewise, cached datasets that do not fit in memory are either spilled to disk or recomputed on the fly when needed, as determined by the RDD's storage level.
shuffle write
- Data move from Executor(s) to another Executor(s) - is used when data needs to move between executors (e.g. due to JOIN, groupBy, etc)
more data can be found here:
An edge case example which might help clearing this issue:
Assuming that the data holds one key, Performing groupByKey, will bring all the data into one partition. Shuffle size
will be 9*128MB (9 executors will transfer their data into the last executor), and there won't be any spill to disk
as the executor has 100GB of RAM and only 1GB of data
Regarding AppendOnlyMap :
As written in the
AppendOnlyMap
code (see above) - this function is a low level implementation of a simple open hash table optimized for the append-only use case, where keys are never removed, but the value for each key may be changed.
The fact that two different modules uses the same low-level function doesn't mean that those functions are related in hi-level.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With