I want to reset the spark.sql.shuffle.partitions configure in the pyspark code, since I need to join two big tables. But the following code doesn't not work in the latest spark version, the error says that "no method "setConf" in xxx"
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import pyspark
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)
spark.sparkContext.setConf("spark.sql.shuffle.partitions", "1000")
spark.sparkContext.setConf("spark.default.parallelism", "1000")
# or using the follow, neither is working
spark.setConf("spark.sql.shuffle.partitions", "1000")
spark.setConf("spark.default.parallelism", "1000")
I would like to know how to reset the "spark.sql.shuffle.partitions" now.
SparkSession provides a RuntimeConfig interface to set and get Spark related parameters. The answer to your question would be:
spark.conf.set("spark.sql.shuffle.partitions", 1000)
Refer: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.RuntimeConfig
I've missed that your question was about pyspark. Pyspark has a similar interface spark.conf
.
Refer: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=sparksession#pyspark.sql.SparkSession.conf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With