I am importing SparkSession
as follows in PySpark:
from pyspark.sql import SparkSession
Then I create SparkSession
:
spark = SparkSession.builder.appName("test").getOrCreate()
and try to access SparkContext
:
spark.SparkContext.broadcast(...)
However, I get an error that SparkContext
does not exist. How can I access it in order to set broadcast
variables?
Since Spark 2.0 SparkSession is an entry point to underlying Spark functionality. All functionality available with SparkContext is also available in SparkSession.
By using getAll() method of SparkConf you can get all current active Spark/PySpark SparkContext settings, you can also use get() method to get value for specific settings.
SparkSession vs SparkContext – Since earlier versions of Spark or Pyspark, SparkContext (JavaSparkContext for Java) is an entry point to Spark programming with RDD and to connect to Spark Cluster, Since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset.
As a result, when comparing SparkSession vs SparkContext, as of Spark 2.0. 0, it is better to use SparkSession because it provides access to all of the Spark features that the other three APIs do.
You almost got it right, it's lowercase s at the beginning:
>>> spark.sparkContext
<SparkContext master=local[*] appName=PySparkShell>
Asumming you have a spark session
spark_session = SparkSession \
.builder \
.enableHiveSupport() \
.getOrCreate()
Spark Context can be inferred using
spark_context = spark_session._sc
or
spark_context = spark_session.sparkContext
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With