I'm running a spark batch job and uses SparkSession
as I need a lot of spark-sql features to process in each of my components. The SparkContext
is initialized in my parent component and been passed to the child components as SparkSession
.
In one of my child components, I wanted to add two more configurations to my SparkContext
. Hence, I need to retrieve the SparkContext
from the SparkSession
, stop it and recreate the SparkSession
with the additional configuration. To do so, how can I retrieve SparkContext from SparkSession?
By using getAll() method of SparkConf you can get all current active Spark/PySpark SparkContext settings, you can also use get() method to get value for specific settings.
SparkSession vs SparkContext – Since earlier versions of Spark or Pyspark, SparkContext (JavaSparkContext for Java) is an entry point to Spark programming with RDD and to connect to Spark Cluster, Since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset.
For an existing SparkConf, use conf parameter. >>> from pyspark. conf import SparkConf >>> SparkSession.
Once the SparkSession is instantiated, we can configure Spark's run-time config properties. Spark 2.0. 0 onwards, it is better to use sparkSession as it provides access to all the spark Functionalities that sparkContext does. Also, it provides APIs to work on DataFrames and Datasets.
Just to post as an answer - the SparkContext can be accessed from SparkSession using spark.sparkContext
(no parenthesis)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With