My project has unit tests for different HiveContext configurations (sometimes they are in one file as they are grouped by features.)
After upgrading to Spark 1.4 I encounter a lot of 'java.sql.SQLException: Another instance of Derby may have already booted the database' problems, as a patch make those contexts unable to share the same metastore. Since its not clean to revert state of a singleton for every test. My only option boils down to "recycle" each context by terminating the previous Derby metastore connection. Is there a way to do this?
Use the stop method to end the Spark session.
A Hive metastore warehouse (aka spark-warehouse) is the directory where Spark SQL persists tables whereas a Hive metastore (aka metastore_db) is a relational database to manage the metadata of the persistent relational entities, e.g. databases, tables, columns, partitions.
To create a basic SQLContext , all you need is a SparkContext. The entry point into all functionality in Spark SQL is the SQLContext class, or one of its descendants. To create a basic SQLContext , all you need is a SparkContext. JavaSparkContext sc = ...; // An existing JavaSparkContext.
Well in scala I just used FunSuite for Unit Tests together with BeforeAndAfterAll trait. Then you can just init your sparkContext in beforeAll, spawn your HiveContext from it and finish it like this:
override def afterAll(): Unit = {
if(sparkContext != null)
sparkContext .stop()
}
From what I've noticed it also closes a HiveContext attached to it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With