I am running spark 2, hive, hadoop at local machine, and I want to use spark sql to read data from hive table.
It works all fine when I have hadoop running at default hdfs://localhost:9000
, but if I change to a different port in core-site.xml:
<name>fs.defaultFS</name>
<value>hdfs://localhost:9099</value>
Running a simple sql spark.sql("select * from archive.tcsv3 limit 100").show();
in spark-shell will give me the error:
ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
.....
From local/147.214.109.160 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused;
.....
I get the AlreadyExistsException before, which doesn't seem to influence the result.
I can make it work by creating a new sparkContext:
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
sc.stop()
var sc = new SparkContext()
val session = SparkSession.builder().master("local").appName("test").enableHiveSupport().getOrCreate()
session.sql("show tables").show()
My question is, why the initial sparkSession/sparkContext did not get the correct configuration? How can I fix it? Thanks!
All functionality available with SparkContext is also available in SparkSession. Also, it provides APIs to work on DataFrames and Datasets.
In earlier versions of Spark or Pyspark, SparkContext was an entry point for programming with RDD and connecting to Spark Cluster. With the introduction of Spark 2.0 SparkSession, it became an entry point for programming with DataFrame and Dataset.
SparkSession Encapsulates SparkContextIt allows you to configure Spark configuration parameters. And through SparkContext, the driver can access other contexts such as SQLContext, HiveContext, and StreamingContext to program Spark.
If you are using SparkSession
and you want to set configuration on the the spark context then use session.sparkContext
val session = SparkSession
.builder()
.appName("test")
.enableHiveSupport()
.getOrCreate()
import session.implicits._
session.sparkContext.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
You don't need to import SparkContext
or created it before the SparkSession
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With