Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Apache Spark SQL, How to close metastore connection from HiveContext

My project has unit tests for different HiveContext configurations (sometimes they are in one file as they are grouped by features.)

After upgrading to Spark 1.4 I encounter a lot of 'java.sql.SQLException: Another instance of Derby may have already booted the database' problems, as a patch make those contexts unable to share the same metastore. Since its not clean to revert state of a singleton for every test. My only option boils down to "recycle" each context by terminating the previous Derby metastore connection. Is there a way to do this?

like image 642
tribbloid Avatar asked Aug 24 '15 23:08

tribbloid


People also ask

How do you close a SparkSession?

Use the stop method to end the Spark session.

What is stored in the Metastore for a Spark SQL table?

A Hive metastore warehouse (aka spark-warehouse) is the directory where Spark SQL persists tables whereas a Hive metastore (aka metastore_db) is a relational database to manage the metadata of the persistent relational entities, e.g. databases, tables, columns, partitions.

How do I get SparkContext from SQLContext?

To create a basic SQLContext , all you need is a SparkContext. The entry point into all functionality in Spark SQL is the SQLContext class, or one of its descendants. To create a basic SQLContext , all you need is a SparkContext. JavaSparkContext sc = ...; // An existing JavaSparkContext.


1 Answers

Well in scala I just used FunSuite for Unit Tests together with BeforeAndAfterAll trait. Then you can just init your sparkContext in beforeAll, spawn your HiveContext from it and finish it like this:

  override def afterAll(): Unit = {
    if(sparkContext != null)
      sparkContext .stop()
  }

From what I've noticed it also closes a HiveContext attached to it.

like image 138
TheMP Avatar answered Oct 13 '22 02:10

TheMP