How do I get independent service Zeppelin to see Hive?

Tags:

I am using HDP-2.6.0.3 but I need Zeppelin 0.8, so I have installed it as an independent service. When I run:

%sql 
show tables

I get nothing back and I get 'table not found' when I run Spark2 SQL commands. Tables can be seen in the 0.7 Zeppelin that is part of HDP.

Can anyone tell me what I am missing, for Zeppelin/Spark to see Hive?

The steps I performed to create the zep0.8 are as follows:

maven clean package -DskipTests -Pspark-2.1 -Phadoop-2.7-Dhadoop.version=2.7.3 -Pyarn -Ppyspark -Psparkr -Pr -Pscala-2.11

Copied zeppelin-site.xml and shiro.ini from /usr/hdp/2.6.0.3-8/zeppelin/conf to /home/ed/zeppelin/conf.

created /home/ed/zeppelin/conf/zeppeli-env.sh in which I put the following:

export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.6.0.3-8"

Copied /etc/hive/conf/hive-site.xml to /home/ed/zeppelin/conf

EDIT: I have also tried:

import org.apache.spark.sql.SparkSession
val spark = SparkSession
          .builder()
          .appName("interfacing spark sql to hive metastore without configuration file")
          .config("hive.metastore.uris", "thrift://s2.royble.co.uk:9083") // replace with your hivemetastore service's thrift url
          .config("url", "jdbc:hive2://s2.royble.co.uk:10000/default")
          .config("UID", "admin")
          .config("PWD", "admin")
          .config("driver", "org.apache.hive.jdbc.HiveDriver")
          .enableHiveSupport() // don't forget to enable hive support
          .getOrCreate()

same result, and:

import java.sql.{DriverManager, Connection, Statement, ResultSet}
val url = "jdbc:hive2://"
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "admin"
val password = "admin"
Class.forName(driver).newInstance
val conn: Connection = DriverManager.getConnection(url, user, password)

which gives:

 java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
ERROR XSDB6: Another instance of Derby may have already booted the database /home/ed/metastore_db

Fixed error with:

val url = "jdbc:hive2://s2.royble.co.uk:10000"

but still no tables :(

509

asked Oct 18 '17 12:10

schoon

1 Answers

This works:

import java.sql.{DriverManager, Connection, Statement, ResultSet}
val url = "jdbc:hive2://s2.royble.co.uk:10000"
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "admin"
val password = "admin"
Class.forName(driver).newInstance
val conn: Connection = DriverManager.getConnection(url, user, password)
val r: ResultSet = conn.createStatement.executeQuery("SELECT * FROM tweetsorc0")

but then I have the pain of converting the resultset to a dataframe. I'd rather SparkSession worked and I get a dataframe so I will add a bounty later today.

106

answered Oct 22 '22 00:10

schoon

Related questions
                            
                                Apache Spark shell crashes when trying to start executor on worker
                            
                                Spark RDD equivalent to Scala collections partition
                            
                                ON DUPLICATE KEY UPDATE while inserting from pyspark dataframe to an external database table via JDBC
                            
                                Why spark executor receives SIGTERM?
                            
                                Spark ML - MulticlassClassificationEvaluator - can we get precision/recall by each class label?
                            
                                Is proper event-time sessionization possible with Spark Structured Streaming?
                            
                                Python Spark Dataframes: Better way to export groups to text file
                            
                                Proper save/load of MatrixFactorizationModel
                            
                                How does Spark send closures to workers?
                            
                                Pyspark: applying kmeans on different groups of a dataframe
                            
                                Structured streaming - Metrics in Grafana
                            
                                Spark accumulator not displayed in spark WebUI
                            
                                how to redirect Scala Spark Dataset.show to log4j logger
                            
                                Applying Python function to Pandas grouped DataFrame - what's the most efficient approach to speed up the computations?
                            
                                Using SparkR JVM to call methods from a Scala jar file
                            
                                Sorting JavaPairRDD first by value and then by key
                            
                                How to protect password and username in Spark (such as for JDBC connections/accessing RDBMS databases)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I get independent service Zeppelin to see Hive?

Tags:

apache-spark

hive

hortonworks-data-platform

apache-zeppelin

schoon

People also ask

1 Answers

schoon

Recent Activity

Donate For Us