I see that SparkSession
doesn't have .parallelize()
method, Do we need to use SparkContext
again to create a RDD?. If so, is creating both SparkSession
& SparkContext
in a single program advisable?
Once you build your SparkSession, you can fetch the underlying SparkContext created with it as followed :
Let's consider that SparkSession is already defined :
val spark : SparkSession = ???
You can get SparkContext now :
val sc = spark.sparkContext
There is method of spark Context in the SparkSession Class
val data = spark.sparkContext.parallelize(Seq(1,2,3,4))
data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With