Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Hive Thrift Client

I'm working on adding HiveServer2 support to my company's R data-access package. I'm curious what the best way of generating an R Thrift client would be. I'm considering writing an R wrapper around the Java Thrift client, similar to what rhbase does, but I'd prefer a pure R solution, if possible.

Things to note:

  • HiveServer2 thrift server is different from the original Hive Thrift server.
  • I've looked at and used the RHive package. Among other issues I have with it, it requires a system-install of Hadoop and Hive, which will not always be available on R client machines.
  • My somewhat horrible - but currently sufficient - workaround is to wrap the beeline client in some R goodness.
like image 580
yoni Avatar asked Jul 19 '13 20:07

yoni


People also ask

What is Hive Thrift client?

Thrift comes in the architectural part of Hive, Thrift is a protocol for the application which were developed in a different programming languages to communicate. So the Thrift server sits in a hive services layer and this is the one which receives the request or hive queries from client programs.

What is thrift server?

Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous programming languages. It was developed at Facebook for "scalable cross-language services development" and as of 2020 is an open source project in the Apache Software Foundation.


1 Answers

The exact scope of this question may be too broad for Stackoverflow and the asker confirmed he abandoned this quest, but for future readers this is probably the thing to look for:

From R you can connect to Hive with JDBC.

This is not exactly what the asker came for, but it should serve the purpose in most cases.


The key part in the solution for this would be the RJDBC package, here is some example code found on the Cloudera Community

library(DBI)
library(rJava)
library(RJDBC)
hadoop.class.path = list.files(path=c("/usr/hdp/2.4.0.0-169/hadoop"),pattern="jar", full.names=T);
hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar", full.names=T);
hadoop.lib.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar",full.names=T);
mapred.class.path = list.files(path=c("/usr/hdp/current/hadoop-mapreduce-client/lib"),pattern="jar",full.names=T);
cp = c(hive.class.path,hadoop.lib.path,mapred.class.path,hadoop.class.path)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver","hive-jdbc.jar",identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://ixxx:10000/default", "hive", "hive")
show_databases <- dbGetQuery(conn, "show databases")

Full disclosure: I am an employee of cloudera.

like image 114
Dennis Jaheruddin Avatar answered Sep 30 '22 11:09

Dennis Jaheruddin