Spark 2.0: Redefining SparkSession params through GetOrCreate and NOT seeing changes in WebUI

Tags:

I'm using Spark 2.0 with PySpark.

I am redefining SparkSession parameters through a GetOrCreate method that was introduced in 2.0:

This method first checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default.

In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.getOrCreate

So far so good:

from pyspark import SparkConf

SparkConf().toDebugString()
'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client'

spark.conf.get("spark.app.name")
'pyspark-shell'

Then I redefine SparkSession config with the promise to see the changes in WebUI

appName(name)
Sets a name for the application, which will be shown in the Spark web UI.

https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.appName

c = SparkConf()
(c
 .setAppName("MyApp")
 .setMaster("local")
 .set("spark.driver.memory","1g")
 )

from pyspark.sql import SparkSession
(SparkSession
.builder
.enableHiveSupport() # metastore, serdes, Hive udf
.config(conf=c)
.getOrCreate())

spark.conf.get("spark.app.name")
'MyApp'

Now, when I go to localhost:4040, I would expect to see MyApp as an app name.

However, I still see pyspark-shell application UI

Where am I wrong?

Thanks in advance!

570

asked Nov 20 '16 07:11

Sergey Bushmanov

1 Answers

I believe that documentation is a bit misleading here and when you work with Scala you actually see a warning like this:

... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.

It was more obvious prior to Spark 2.0 with clear separation between contexts:

SparkContext configuration cannot be modified on runtime. You have to stop existing context first.
SQLContext configuration can be modified on runtime.

spark.app.name, like many other options, is bound to SparkContext, and cannot be modified without stopping the context.

Reusing existing SparkContext / SparkSession

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

spark.conf.get("spark.sql.shuffle.partitions")

String = 200

val conf = new SparkConf()
  .setAppName("foo")
  .set("spark.sql.shuffle.partitions", "2001")

val spark = SparkSession.builder.config(conf).getOrCreate()

... WARN SparkSession$Builder: Use an existing SparkSession ...
spark: org.apache.spark.sql.SparkSession =  ...

spark.conf.get("spark.sql.shuffle.partitions")

String = 2001

While spark.app.name config is updated:

spark.conf.get("spark.app.name")

String = foo

it doesn't affect SparkContext:

spark.sparkContext.appName

String = Spark shell

Stopping existing SparkContext / SparkSession

Now let's stop the session and repeat the process:

spark.stop
val spark = SparkSession.builder.config(conf).getOrCreate()

...  WARN SparkContext: Use an existing SparkContext ...
spark: org.apache.spark.sql.SparkSession = ...

spark.sparkContext.appName

String = foo

Interestingly when we stop the session we still get a warning about using existing SparkContext, but you can check it is actually stopped.

answered Oct 12 '22 22:10

zero323

Related questions
                            
                                Spark, Alternative to Fat Jar
                            
                                Extract words from a string column in spark dataframe
                            
                                SQL over Spark Streaming
                            
                                Get current task ID in Spark in Java
                            
                                Can I use Spark without Hadoop for development environment?
                            
                                spark.ml StringIndexer throws 'Unseen label' on fit()
                            
                                Scala - why Double consume less memory than Floats in this case?
                            
                                Filtering rows based on column values in spark dataframe scala
                            
                                How to add a column to Dataset without converting from a DataFrame and accessing it?
                            
                                AWS Glue write parquet with partitions
                            
                                pyspark partitioning data using partitionby
                            
                                Default number of executors and cores for spark-shell
                            
                                How to calculate Percentile of column in a DataFrame in spark?
                            
                                How to use a broadcast collection in a udf?
                            
                                How to group by common element in array?
                            
                                How to filter on partial match using sparklyr
                            
                                What is the difference between .sc and .scala file?
                            
                                How to print elements of particular RDD partition in Spark?
                            
                                Using Apache Spark with HDFS vs. other distributed storage
                            
                                How to use Spark Structured Streaming with Kafka Direct Stream?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark 2.0: Redefining SparkSession params through GetOrCreate and NOT seeing changes in WebUI

Tags:

apache-spark

apache-spark-sql

pyspark

pyspark-sql

Sergey Bushmanov

People also ask

1 Answers

zero323

Recent Activity

Donate For Us