What is the formula that Spark uses to calculate the number of reduce tasks? I am running a couple of spark-sql queries and the number of reduce tasks always is 200. The number of map tasks for these queries is 154. I am on Spark 1.4.1. Is this related to spark.shuffle.sort.bypassMergeThreshold, which defaults to 200

It's <code>spark.sql.shuffle.partitions</code> that you're after. According to the Spark SQL performance tuning guide: <pre class="prettyprint"><code>| Property Name | Default | Meaning | +-------------------------------+---------+------------------------------------------------+ | spark.sql.shuffle.partitions | 200 | Configures the number of partitions to use | | | | when shuffling data for joins or aggregations. | </code></pre> Another option that is related is <code>spark.default.parallelism</code>, which determines the 'default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set by user', however this seems to be ignored by Spark SQL and only relevant when working on plain RDDs.

Number reduce tasks Spark

2 Answers

It's spark.sql.shuffle.partitions that you're after. According to the Spark SQL performance tuning guide:

| Property Name                 | Default | Meaning                                        | +-------------------------------+---------+------------------------------------------------+ | spark.sql.shuffle.partitions  | 200     | Configures the number of partitions to use     | |                               |         | when shuffling data for joins or aggregations. |

Another option that is related is spark.default.parallelism, which determines the 'default number of partitions in RDDs returned by transformations like join, reduceByKey, and parallelize when not set by user', however this seems to be ignored by Spark SQL and only relevant when working on plain RDDs.

160

answered Sep 18 '22 19:09

sgvd

Yes, @svgd, that is the correct parameter. Here is how you reset it in Scala:

// Set number of shuffle partitions to 3 sqlContext.setConf("spark.sql.shuffle.partitions", "3") // Verify the setting  sqlContext.getConf("spark.sql.shuffle.partitions")

answered Sep 17 '22 19:09

pmhargis

Related questions
                            
                                How to place an Image on top of other Image in React Native?
                            
                                How to use WSDL with spring-boot?
                            
                                Show Helper text below EditText along with the Hint
                            
                                Generate interval from variable in Presto
                            
                                How to get Pandas column multiindex names as a list
                            
                                Dynamic size for tf.zeros() (for use with placeholders with None dimensions)
                            
                                Xcode - Invalid character in source file (Replace " " with " ")
                            
                                React-Native Offline Bundle - Images not showing
                            
                                How to get the current ASP.NET core controller method name inside the controller using Reflection or another accurate method
                            
                                Laravel: Get URL from routes BY NAME
                            
                                Are JavaScript Promise asynchronous?
                            
                                Proper way to use selectors in Swift

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Number reduce tasks Spark

Tags:

Uli Bethke

People also ask

2 Answers

sgvd

pmhargis

Recent Activity

Donate For Us