I am trying to run SparkSQL :
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
But the error i m getting is below:
... 125 more
Caused by: java.sql.SQLException: Another instance of Derby may have already booted the database /root/spark/bin/metastore_db.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
... 122 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /root/spark/bin/metastore_db.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
I see there is a metastore_db folder exists..
My hive metastore includes mysql as metastore.But not sure why the error shows as derby execption
I was getting the same error while creating Data frames on Spark Shell :
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /metastore_db.
Cause:
I found that this is happening as there were multiple other instances of Spark-Shell already running and holding derby DB already, so when i was starting yet another Spark Shell and creating Data Frame on it using RDD.toDF() it was throwing error:
Solution:
I ran the ps command to find other instances of Spark-Shell:
ps -ef | grep spark-shell
and i killed them all using kill command:
kill -9 Spark-Shell-processID ( example: kill -9 4848)
after all the SPark-Shell instances were gone, i started a new SPark SHell and reran my Data frame function and it ran just fine :)
If you're running in spark shell, you shouldn't instantiate a HiveContext, there's one created automatically called sqlContext
(the name is misleading - if you compiled Spark with Hive, it will be a HiveContext). See similar discussion here.
If you're not running in shell - this exception means you've created more than one HiveContext in the same JVM, which seems to be impossible - you can only create one.
an lck(lock) file is an access control file which locks the database so that only a single user can access or update the database. The error suggests that there is another instance which is using the same database. Thus you need to delete the .lck files. In your home directory, go to metastore_db and delete any .lck files.
Another case where you can see the same error is a Spark REPL of an AWS Glue dev endpoint, when you are trying to convert a dynamic frame into a dataframe.
There are actually several different exceptions like:
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
ERROR XSDB6: Another instance of Derby may have already booted the database /home/glue/metastore_db.
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader
The solution is hard to find with google but eventually it is described here.
The loaded REPL contains an instantiated SparkSession
in a variable spark
and you just need to stop it before creating a new SparkContext
:
>>> spark.stop()
>>> from pyspark.context import SparkContext
>>> from awsglue.context import GlueContext
>>>
>>> glue_context = GlueContext(SparkContext.getOrCreate())
>>> glue_frame = glue_context.create_dynamic_frame.from_catalog(database=DB_NAME, table_name=T_NAME)
>>> df = glue_frame.toDF()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With