Lately I've been tuning the performance of some large, shuffle heavy jobs. Looking at the spark UI, I noticed an option called "Shuffle Read Blocked Time" under the additional metrics section.
This "Shuffle Read Blocked Time" seems to account for upwards of 50% of the task duration for a large swath of tasks.
While I can intuit some possibilities for what this means, I can't find any documentation that explains what it actually represents. Needless to say, I also haven't been able to find any resources on mitigation strategies.
Can anyone provide some insight into how I might reduce Shuffle Read Blocked Time?
Spark shuffles the mapped data across partitions, some times it also stores the shuffled data into a disk for reuse when it needs to recalculate. Finally runs reduce tasks on each partition based on key.
Bunch of shuffle data corresponding to a shuffle reduce task written by a shuffle map task is called a shuffle block. Further, each of the shuffle map tasks informs the driver about the written shuffle data.
Task Deserialization TimeSpark by default uses the Java serializer for object serialization. To enable Kyro serializer, which outperforms the default Java serializer on both time and space, set the spark. serializer parameter to org.
"Shuffle Read Blocked Time" is the time that tasks spent blocked waiting for shuffle data to be read from remote machines. The exact metric it feeds from is shuffleReadMetrics.fetchWaitTime.
Hard to give input into a strategy to mitigate it without actually knowing what data you're trying to read or what sort of remote machines you're reading from. However, consider the following:
As to the metrics, this documentation should shed some light on them: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-webui-StagePage.html
Lastly, i did also find it hard to find information on Shuffle Read Blocked Time, but if you put in quotes like: "Shuffle Read Blocked Time" in a google search, you'll find some decent results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With