Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parallelize() method while using SparkSession in Spark 2.0

Tags:

I see that SparkSession doesn't have .parallelize() method, Do we need to use SparkContext again to create a RDD?. If so, is creating both SparkSession & SparkContext in a single program advisable?

like image 602
vdep Avatar asked Oct 06 '16 13:10

vdep


2 Answers

Once you build your SparkSession, you can fetch the underlying SparkContext created with it as followed :

Let's consider that SparkSession is already defined :

val spark : SparkSession = ??? 

You can get SparkContext now :

val sc = spark.sparkContext
like image 199
eliasah Avatar answered Oct 27 '22 09:10

eliasah


There is method of spark Context in the SparkSession Class

val data = spark.sparkContext.parallelize(Seq(1,2,3,4))
data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:23
like image 37
loneStar Avatar answered Oct 27 '22 11:10

loneStar