It's Spark 2.2.0-SNAPSHOT.
Why is the number of partitions after groupBy
transformation 200 in the following example?
scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions
res0: Int = 200
What's so special about 200
? Why not some other number like 1024
?
I've been told about Why does groupByKey operation have always 200 tasks? that asks specifically about groupByKey
, but the question is about the "mystery" behind picking 200
as the default not why there are 200 partitions by default.
This is set by spark.sql.shuffle.partitions
In general whenever you do a spark sql aggregation or join which shuffles data this is the number of resulting partitions.
It is constant for your entire action (i.e. it is not possible to change it for one transformation and then again for another).
See http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options for more information
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With