Why can't we create an RDD using Spark session

Tags:

rdd

We see that,

Spark context available as 'sc'.
Spark session available as 'spark'.

I read spark session includes spark context, streaming context, hive context ... If so, then why are we not able to create an rdd by using a spark session instead of a spark context.

scala> val a = sc.textFile("Sample.txt")
17/02/17 16:16:14 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
a: org.apache.spark.rdd.RDD[String] = Sample.txt MapPartitionsRDD[1] at textFile at <console>:24

scala> val a = spark.textFile("Sample.txt")
<console>:23: error: value textFile is not a member of org.apache.spark.sql.SparkSession
       val a = spark.textFile("Sample.txt")

As shown above, sc.textFile succeeds in creating an RDD but not spark.textFile.

588

asked Feb 17 '17 10:02

1 Answers

In earlier versions of spark, spark context was entry point for Spark. As RDD was main API, it was created and manipulated using context API’s.

For every other API,we needed to use different contexts.For streaming, we needed StreamingContext, for SQL sqlContext and for hive HiveContext.

But as DataSet and Dataframe API’s are becoming new standard API’s Spark need an entry point build for them. So in Spark 2.0, Spark have a new entry point for DataSet and Dataframe API’s called as Spark Session.

SparkSession is essentially combination of SQLContext, HiveContext and future StreamingContext.

All the API’s available on those contexts are available on spark session also. Spark session internally has a spark context for actual computation.

sparkContext still contains the method which it had in previous version .

methods of sparkSession can be found here

180

answered Nov 14 '22 14:11

bob

Related questions
                            
                                Can't find spark submit when typing spark-shell
                            
                                spark-class: line 71...No such file or directory
                            
                                Convert Row to map in spark scala
                            
                                Error when Spark 2.2.0 standalone mode write Dataframe to local single-node Kafka
                            
                                How to rename duplicated columns after join? [duplicate]
                            
                                Who can give a clear explanation for `combineByKey` in Spark?
                            
                                How to get applicationId of Spark application deployed to YARN in Scala?
                            
                                How to use functions provide by DataFrameNaFunctions class in Spark, on a Dataframe?
                            
                                Spark UDF error - Schema for type Any is not supported
                            
                                Apache Spark: Difference between parallelize and broadcast
                            
                                Issue while opening Spark shell
                            
                                pyspark: counter part of like() method in dataframe
                            
                                Spark avoid creating _temporary directory in S3
                            
                                Is there any better way to convert Array<int> to Array<String> in pyspark
                            
                                Change schema of existing dataframe
                            
                                save Spark dataframe to Hive: table not readable because "parquet not a SequenceFile"
                            
                                How to combine n-grams into one vocabulary in Spark?
                            
                                Scala Dataframe null check for columns
                            
                                Spark, Scala - column type determine
                            
                                How to remove empty rows from an Pyspark RDD

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can't we create an RDD using Spark session

Tags:

apache-spark

rdd

Sudha

People also ask

1 Answers

bob

Recent Activity

Donate For Us