Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get independent service Zeppelin to see Hive?

I am using HDP-2.6.0.3 but I need Zeppelin 0.8, so I have installed it as an independent service. When I run:

%sql 
show tables

I get nothing back and I get 'table not found' when I run Spark2 SQL commands. Tables can be seen in the 0.7 Zeppelin that is part of HDP.

Can anyone tell me what I am missing, for Zeppelin/Spark to see Hive?

The steps I performed to create the zep0.8 are as follows:

maven clean package -DskipTests -Pspark-2.1 -Phadoop-2.7-Dhadoop.version=2.7.3 -Pyarn -Ppyspark -Psparkr -Pr -Pscala-2.11

Copied zeppelin-site.xml and shiro.ini from /usr/hdp/2.6.0.3-8/zeppelin/conf to /home/ed/zeppelin/conf.

created /home/ed/zeppelin/conf/zeppeli-env.sh in which I put the following:

export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.6.0.3-8"

Copied /etc/hive/conf/hive-site.xml to /home/ed/zeppelin/conf

EDIT: I have also tried:

import org.apache.spark.sql.SparkSession
val spark = SparkSession
          .builder()
          .appName("interfacing spark sql to hive metastore without configuration file")
          .config("hive.metastore.uris", "thrift://s2.royble.co.uk:9083") // replace with your hivemetastore service's thrift url
          .config("url", "jdbc:hive2://s2.royble.co.uk:10000/default")
          .config("UID", "admin")
          .config("PWD", "admin")
          .config("driver", "org.apache.hive.jdbc.HiveDriver")
          .enableHiveSupport() // don't forget to enable hive support
          .getOrCreate()

same result, and:

import java.sql.{DriverManager, Connection, Statement, ResultSet}
val url = "jdbc:hive2://"
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "admin"
val password = "admin"
Class.forName(driver).newInstance
val conn: Connection = DriverManager.getConnection(url, user, password)

which gives:

 java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
ERROR XSDB6: Another instance of Derby may have already booted the database /home/ed/metastore_db

Fixed error with:

val url = "jdbc:hive2://s2.royble.co.uk:10000"

but still no tables :(

like image 509
schoon Avatar asked Oct 18 '17 12:10

schoon


People also ask

How do I enable Hive support in spark?

to connect to hive metastore you need to copy the hive-site. xml file into spark/conf directory. After that spark will be able to connect to hive metastore.

What is Zeppelin hive?

Zeppelin + Hive. Apache Zeppelin is a web-based notebook platform that enables interactive data analytics with interactive data visualizations and notebook sharing. We can integrate Hive using JDBC Interpreter.

What is Hive warehouse connector?

The Hive Warehouse Connector (HWC) is a Spark library/plugin that is launched with the Spark app. You use the Hive Warehouse Connector API to access any managed Hive table from Spark. You must use low-latency analytical processing (LLAP) in HiveServer Interactive to read ACID, or other Hive-managed tables, from Spark.


1 Answers

This works:

import java.sql.{DriverManager, Connection, Statement, ResultSet}
val url = "jdbc:hive2://s2.royble.co.uk:10000"
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "admin"
val password = "admin"
Class.forName(driver).newInstance
val conn: Connection = DriverManager.getConnection(url, user, password)
val r: ResultSet = conn.createStatement.executeQuery("SELECT * FROM tweetsorc0")

but then I have the pain of converting the resultset to a dataframe. I'd rather SparkSession worked and I get a dataframe so I will add a bounty later today.

like image 106
schoon Avatar answered Oct 22 '22 00:10

schoon