Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to connect to Hadoop/Hive from .NET

Tags:

c#

hadoop

hive

I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can't find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not finding?

like image 262
James Avery Avatar asked Aug 16 '10 14:08

James Avery


People also ask

How Hadoop and Hive are connected?

The Sisense Hive connector is a certified connector that allows you to import data from the Apache Hadoop Hive API into Sisense via the Sisense generic JDBC connector. The connector offers the most natural way to connect to Apache Hadoop Hive data and provides additional powerful features.

How do I connect to Hive from remote server?

One way to bypass this is run the Hive JDBC/Thrift server on the box that has the Hadoop infrastructure — that is, to run the hive program with command-line options to run it as a Hive-server on the desired port and so on — and then connect to it using your favorite JDBC-supporting SQL client.

How do you connect to Hadoop?

To setup a new Hadoop filesystem connection, go to Administration → Connections → New connection → HDFS. A HDFS connection in DSS consists of : a root path, under which all the data accessible through that connection resides.

Does Hive work with HDFS?

Apache Hive is an open source data warehouse software for reading, writing and managing large data set files that are stored directly in either the Apache Hadoop Distributed File System (HDFS) or other data storage systems such as Apache HBase.


6 Answers

It is possible to access Hive utilizing C# by making use of Microsoft's ODBC connector. Download the Nuget package for "Microsoft.Hadoop.Hive" and follow the example provided at http://msdn.microsoft.com/en-us/library/dn749834.aspx

The trick lies in building the connection string to connect with it. The best way I came up with was to download the Microsoft Hive ODBC Driver (http://www.microsoft.com/en-us/download/details.aspx?id=40886), install it, then use the Server Explorer inside Visual Studio to add a new connection, then build the connection string for me. To do this, I used the following steps:

  • Change the data source to "Microsoft ODBC Data Source" and ensure you're using the ".NET Framework Data Provider for ODBC" as the data provider.

Change Data Source Dialog Window

  • Under the "Data source specification" portion, check the "Use connection string" then click the "Build" button.

Add Connection Dialog Window

  • Under the "Machine Data Source" tab, select the "Sample Microsoft Hive DSN" data source name, then click the "OK" button.

Select Data Source Dialog Window

  • A window titled "Microsoft Hive ODBC Driver Connection Dialog" will open. Enter an optional description, then type in the path to your Hive server, the port you will be using, and what database it should connect to. Indicate the Hive Server Type, and specify an authentication mechanism to use, then fill out the appropriate fields.

Microsoft Hive ODBC Driver Connection Dialog Window

  • Finally, click the "Test" button in the bottom to ensure that you're able to successfully connect. If successful, click the "OK" button, then you'll be back in the "Modify Connection" window. Enter the login information for your Hive service here.

Either utilize this data source or copy the connection string that it's built for you and use it within your application.

like image 92
Whit Waldo Avatar answered Oct 05 '22 05:10

Whit Waldo


With Hadoop: there is no straight way to connect from C# because Hadoop communication tier is working with java only and is not cross platform. It is probably possible but in very non-trivial ways. I know there is a patch to add Protocol Buffers support for Hadoop but at the moment of writing (Aug 2011) is is not released yet.

With Hive situation is better because Hive has Thrift interface which supports C#. You can download Hive Thrift interfaces and generate C# client on your own but beware that it requires some hacking of generated code. Instead I would recommend you downloading dll from https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll or use Nuget package manager, search for "hive": http://nuget.org/List/Packages/Hive.Sharp.Lib Disclaimer: I'm the author.

like image 25
Vadym Chekan Avatar answered Oct 05 '22 07:10

Vadym Chekan


Apparently it is possible to connect to Hadoop with non-Java solutions - see Do I have to write my application in Java?

like image 39
Matthew Hegarty Avatar answered Oct 05 '22 06:10

Matthew Hegarty


  1. There is Hortonworks ODBC driver. I havn't used it personally, but it shall let you work with hive as with any other ODBC datasource. You can use OdbcConnection class to connect to Hive once ODBC driver is installed.

  2. As noted in other answers - you can use Thrift api. For that you need to generate C# classes from interface definition files, which you can download from Hive source repository. This approach works for me.

  3. You can use IKVM, to convert hadoop client java libraries into .Net assemblies which you can use from C#. I havn't used IKVM with Hive client, but I've IKVMed some other hadoop client library and surprisingly it worked.

EDIT:

  1. There's also Apache templeton, which allows submitting Hive jobs (Pig and MR also) using Rest interface. The problem with it is that it spawns another map task to submit Hive job, which makes it slower.
like image 22
Sergey Zyuzin Avatar answered Oct 05 '22 07:10

Sergey Zyuzin


Thrift API is also another way for other language to access hdfs and hive

like image 22
zjffdu Avatar answered Oct 05 '22 06:10

zjffdu


See if this helps. I have tried to connect to Hadoop via C#

How to communicate to Hadoop via Hive using .NET/C#

like image 25
rajibdotnet Avatar answered Oct 05 '22 07:10

rajibdotnet