how to set spark.sql.shuffle.partitions when using the lastest spark version

Question

I want to reset the spark.sql.shuffle.partitions configure in the pyspark code, since I need to join two big tables. But the following code doesn't not work in the latest spark version, the error says that "no method "setConf" in xxx"

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import pyspark
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)

spark.sparkContext.setConf("spark.sql.shuffle.partitions", "1000")
spark.sparkContext.setConf("spark.default.parallelism", "1000")

# or using the follow, neither is working 
spark.setConf("spark.sql.shuffle.partitions", "1000")
spark.setConf("spark.default.parallelism", "1000")

I would like to know how to reset the "spark.sql.shuffle.partitions" now.

Sai Kiriti Badam · Accepted Answer

SparkSession provides a RuntimeConfig interface to set and get Spark related parameters. The answer to your question would be:

spark.conf.set("spark.sql.shuffle.partitions", 1000)

Refer: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.RuntimeConfig

I've missed that your question was about pyspark. Pyspark has a similar interface spark.conf. Refer: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.conf

how to set spark.sql.shuffle.partitions when using the lastest spark version

Tags:

shuffle

pyspark-sql

pingping chen

1 Answers

Sai Kiriti Badam

Recent Activity

Donate For Us

how to set spark.sql.shuffle.partitions when using the lastest spark version

Tags:

shuffle

pyspark-sql

pingping chen

1 Answers

Sai Kiriti Badam

Related questions

Recent Activity

Donate For Us