Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change SparkContext properties in Interactive PySpark session

How can I change spark.driver.maxResultSize in pyspark interactive shell? I have used the following code

from pyspark import SparkConf, SparkContext
conf = (SparkConf()
    .set("spark.driver.maxResultSize", "10g"))
sc.stop()
sc=SparkContext(conf)

but it gives me the error

AttributeError: 'SparkConf' object has no attribute '_get_object_id'
like image 761
MARK Avatar asked Sep 02 '15 20:09

MARK


3 Answers

So what your seeing is that the SparkConf isn't a java object, this is happening because its trying to use the SparkConf as the first parameter, if instead you do sc=SparkContext(conf=conf) it should use your configuration. That being said, you might be better of just starting a regular python program rather than stopping the default spark context & re-starting it, but you'll need to use the named parameter technique to pass in the conf object either way.

like image 168
Holden Avatar answered Oct 14 '22 13:10

Holden


update configuration in Spark 2.3.1

To change the default spark configurations you can follow these steps:

Import the required classes

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

Get the default configurations

spark.sparkContext._conf.getAll()

Update the default configurations

conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])

Stop the current Spark Session

spark.sparkContext.stop()

Create a Spark Session

spark = SparkSession.builder.config(conf=conf).getOrCreate()
like image 11
bob Avatar answered Oct 14 '22 13:10

bob


The correct way to in-line modify spark settings for a given Spark Context require that the context be closed. For example:

from pyspark import SparkContext
SparkContext.setSystemProperty('spark.driver.maxResultSize', '10g')
sc = SparkContext("local", "App Name")

source: https://spark.apache.org/docs/0.8.1/python-programming-guide.html

p.s. if you need to close the SparkContext just use:

SparkContext.stop(sc)

and to double check the current settings that have been set you can use:

sc._conf.getAll()
like image 6
abby sobh Avatar answered Oct 14 '22 13:10

abby sobh