Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark 2.1.0 session config settings (pyspark)

I am trying to overwrite the spark session/spark context default configs, but it is picking entire node/cluster resource.

 spark  = SparkSession.builder                       .master("ip")                       .enableHiveSupport()                       .getOrCreate()   spark.conf.set("spark.executor.memory", '8g')  spark.conf.set('spark.executor.cores', '3')  spark.conf.set('spark.cores.max', '3')  spark.conf.set("spark.driver.memory",'8g')  sc = spark.sparkContext 

It works fine when i put the configuration in spark submit

spark-submit --master ip --executor-cores=3 --diver 10G code.py 
like image 533
Harish Avatar asked Jan 27 '17 02:01

Harish


People also ask

Should I use SparkSession or SparkContext?

Once the SparkSession is instantiated, we can configure Spark's run-time config properties. Spark 2.0. 0 onwards, it is better to use sparkSession as it provides access to all the spark Functionalities that sparkContext does. Also, it provides APIs to work on DataFrames and Datasets.


2 Answers

You aren't actually overwriting anything with this code. Just so you can see for yourself try the following.

As soon as you start pyspark shell type:

sc.getConf().getAll() 

This will show you all of the current config settings. Then try your code and do it again. Nothing changes.

What you should do instead is create a new configuration and use that to create a SparkContext. Do it like this:

conf = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '3'), ('spark.driver.memory','8g')]) sc.stop() sc = pyspark.SparkContext(conf=conf) 

Then you can check yourself just like above with:

sc.getConf().getAll() 

This should reflect the configuration you wanted.

like image 90
Grr Avatar answered Oct 05 '22 15:10

Grr


update configuration in Spark 2.3.1

To change the default spark configurations you can follow these steps:

Import the required classes

from pyspark.conf import SparkConf from pyspark.sql import SparkSession 

Get the default configurations

spark.sparkContext._conf.getAll() 

Update the default configurations

conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')]) 

Stop the current Spark Session

spark.sparkContext.stop() 

Create a Spark Session

spark = SparkSession.builder.config(conf=conf).getOrCreate() 
like image 44
bob Avatar answered Oct 05 '22 16:10

bob