<ol> <li>What is the difference between <code>SparkContext,</code> <code>JavaSparkContext,</code> <code>SQLContext</code> and <code>SparkSession</code>?</li> <li>Is there any method to convert or create a Context using a <code>SparkSession</code>?</li> <li>Can I completely replace all the Contexts using one single entry <code>SparkSession</code>?</li> <li>Are all the functions in <code>SQLContext</code>, <code>SparkContext</code>, and <code>JavaSparkContext</code> also in <code>SparkSession</code>?</li> <li>Some functions like <code>parallelize</code> have different behaviors in <code>SparkContext</code> and <code>JavaSparkContext</code>. How do they behave in <code>SparkSession</code>?</li> <li> How can I create the following using a <code>SparkSession</code>? <ul> <li><code>RDD</code></li> <li><code>JavaRDD</code></li> <li><code>JavaPairRDD</code></li> <li><code>Dataset</code></li> </ul> </li> </ol> Is there a method to transform a <code>JavaPairRDD</code> into a <code>Dataset</code> or a <code>Dataset</code> into a <code>JavaPairRDD</code>?

<code>sparkContext</code> is a Scala implementation entry point and <code>JavaSparkContext</code> is a java wrapper of <code>sparkContext</code>. <code>SQLContext</code> is entry point of SparkSQL which can be received from <code>sparkContext</code>.Prior to 2.x.x, RDD ,DataFrame and Data-set were three different data abstractions.Since Spark 2.x.x, All three data abstractions are unified and <code>SparkSession</code> is the unified entry point of Spark. An additional note is , RDD meant for unstructured data, strongly typed data and DataFrames are for structured and loosely typed data. You can check <blockquote> Is there any method to convert or create Context using Sparksession ? </blockquote> yes. its <code>sparkSession.sparkContext()</code> and for SQL, <code>sparkSession.sqlContext()</code> <blockquote> Can I completely replace all the Context using one single entry SparkSession ? </blockquote> yes. you can get respective contexs from sparkSession. <blockquote> Does all the functions in SQLContext, SparkContext,JavaSparkContext etc are added in SparkSession? </blockquote> Not directly. you got to get respective context and make use of it.something like backward compatibility <blockquote> How to use such function in SparkSession? </blockquote> get respective context and make use of it. <blockquote> How to create the following using SparkSession? </blockquote> <ol> <li>RDD can be created from <code>sparkSession.sparkContext.parallelize(???)</code> </li> <li>JavaRDD same applies with this but in java implementation </li> <li>JavaPairRDD <code>sparkSession.sparkContext.parallelize(???).map(//making your data as key-value pair here is one way)</code> </li> <li>Dataset what sparkSession returns is Dataset if it is structured data.</li> </ol>

Difference between SparkContext, JavaSparkContext, SQLContext, and SparkSession?

2 Answers

sparkContext is a Scala implementation entry point and JavaSparkContext is a java wrapper of sparkContext.

SQLContext is entry point of SparkSQL which can be received from sparkContext.Prior to 2.x.x, RDD ,DataFrame and Data-set were three different data abstractions.Since Spark 2.x.x, All three data abstractions are unified and SparkSession is the unified entry point of Spark.

An additional note is , RDD meant for unstructured data, strongly typed data and DataFrames are for structured and loosely typed data. You can check

Is there any method to convert or create Context using Sparksession ?

yes. its sparkSession.sparkContext() and for SQL, sparkSession.sqlContext()

Can I completely replace all the Context using one single entry SparkSession ?

yes. you can get respective contexs from sparkSession.

Does all the functions in SQLContext, SparkContext,JavaSparkContext etc are added in SparkSession?

Not directly. you got to get respective context and make use of it.something like backward compatibility

How to use such function in SparkSession?

get respective context and make use of it.

How to create the following using SparkSession?

RDD can be created from sparkSession.sparkContext.parallelize(???)
JavaRDD same applies with this but in java implementation
JavaPairRDD sparkSession.sparkContext.parallelize(???).map(//making your data as key-value pair here is one way)
Dataset what sparkSession returns is Dataset if it is structured data.

104

answered Sep 23 '22 18:09

Balaji Reddy

Explanation from spark source code under branch-2.1

SparkContext: Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.

Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details.

JavaSparkContext: A Java-friendly version of [[org.apache.spark.SparkContext]] that returns [[org.apache.spark.api.java.JavaRDD]]s and works with Java collections instead of Scala ones.

Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details.

SQLContext: The entry point for working with structured data (rows and columns) in Spark 1.x.

As of Spark 2.0, this is replaced by [[SparkSession]]. However, we are keeping the class here for backward compatibility.

SparkSession: The entry point to programming Spark with the Dataset and DataFrame API.

answered Sep 20 '22 18:09

Deanzz

Related questions
                            
                                Jackson deserialize extra fields as Map
                            
                                Unique constraint with JPA and Bean Validation
                            
                                What does it mean for a function to return an interface?
                            
                                Using @EmbeddedId with JpaRepository
                            
                                Need thread safe MessageDigest in Java
                            
                                Why parallel stream get collected sequentially in Java 8
                            
                                Delete item from array and shrink array [duplicate]
                            
                                Spring store object in session
                            
                                How to customize SpringWebFlux WebClient JSON deserialization?
                            
                                Lombok's access to jdk.compiler's internal packages incompatible with Java-16
                            
                                Using mockito to test methods which throw uncaught custom exceptions
                            
                                How to create a Java cron job [duplicate]
                            
                                Integration Test with Spring Boot and Spock
                            
                                Is it discouraged to use @Spy and @InjectMocks on the same field?
                            
                                org.postgresql.util.PSQLException: ERROR: column user0_.id does not exist - Hibernate
                            
                                Why is Stream.sorted not type-safe in Java 8?
                            
                                What is a good way to allow only one non null field in an object
                            
                                Prevent Eclipse from automatically folding the import statements
                            
                                Timezones in SQL DATE vs java.sql.Date
                            
                                Android - Copy assets to internal storage

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between SparkContext, JavaSparkContext, SQLContext, and SparkSession?

Tags:

java

scala

apache-spark

rdd

apache-spark-dataset

Manikandan Balasubramanian

People also ask

2 Answers

Balaji Reddy

Deanzz

Recent Activity

Donate For Us