sparkSession/sparkContext can not get hadoop configuration

I am running spark 2, hive, hadoop at local machine, and I want to use spark sql to read data from hive table.

It works all fine when I have hadoop running at default hdfs://localhost:9000, but if I change to a different port in core-site.xml:


Running a simple sql spark.sql("select * from archive.tcsv3 limit 100").show(); in spark-shell will give me the error:

ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
From local/ to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused;

I get the AlreadyExistsException before, which doesn't seem to influence the result.

I can make it work by creating a new sparkContext:

import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
var sc = new SparkContext()
val session = SparkSession.builder().master("local").appName("test").enableHiveSupport().getOrCreate()
session.sql("show tables").show()

My question is, why the initial sparkSession/sparkContext did not get the correct configuration? How can I fix it? Thanks!

1 Answers

If you are using SparkSession and you want to set configuration on the the spark context then use session.sparkContext

val session = SparkSession
import session.implicits._

session.sparkContext.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")

You don't need to import SparkContext or created it before the SparkSession

