You must build Spark with Hive. Export 'SPARK_HIVE=true'

Question

I'm trying to run a notebook on Analytics for Apache Spark running on Bluemix, but I hit the following error:

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and 
run build/sbt assembly", Py4JJavaError(u'An error occurred while calling 
None.org.apache.spark.sql.hive.HiveContext.
', JavaObject id=o38))

The error is intermittent - it doesn't always happen. The line of code in question is:

df = sqlContext.read.format('jdbc').options(
            url=url, 
            driver='com.ibm.db2.jcc.DB2Driver', 
            dbtable='SAMPLE.ASSETDATA'
        ).load()

There are a few similar questions on stackoverflow, but they aren't asking about the spark service on bluemix.

Roland Weber · Accepted Answer

That statement initializes a HiveContext under the covers. The HiveContext then initializes a local Derby database to hold its metadata. The Derby database is created in the current directory by default. The reported problem occurs under these circumstances (among others):

The Derby database already exists, and there are leftover lock files because the notebook kernel that last used it didn't shut down properly.
The Derby database already exists, and is currently in use by another notebook kernel that also initialized a HiveContext.

Until IBM changes the default setup to avoid this problem, possible workarounds are:

For case 1, delete the leftover lockfiles. From a Python notebook, this is done by executing:
```
!rm -f ./metastore_db/*.lck
```
For case 2, change the current working directory before the Hive context is created. In a Python notebook, this will change into a newly created directory:
```
import os
import tempfile
os.chdir(tempfile.mkdtemp())
```
But beware, it will clutter the filesystem with a new directory and Derby database each time you run that notebook.

I happen to know that IBM is working on a fix. Please use these workarounds only if you encounter the problem, not proactively.

You must build Spark with Hive. Export 'SPARK_HIVE=true'

Tags:

apache-spark

ibm-cloud

Chris Snow

1 Answers

Roland Weber

Recent Activity

Donate For Us

You must build Spark with Hive. Export 'SPARK_HIVE=true'

Tags:

apache-spark

ibm-cloud

Chris Snow

1 Answers

Roland Weber

Related questions

Recent Activity

Donate For Us