I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can't find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not finding?
The Sisense Hive connector is a certified connector that allows you to import data from the Apache Hadoop Hive API into Sisense via the Sisense generic JDBC connector. The connector offers the most natural way to connect to Apache Hadoop Hive data and provides additional powerful features.
One way to bypass this is run the Hive JDBC/Thrift server on the box that has the Hadoop infrastructure — that is, to run the hive program with command-line options to run it as a Hive-server on the desired port and so on — and then connect to it using your favorite JDBC-supporting SQL client.
To setup a new Hadoop filesystem connection, go to Administration → Connections → New connection → HDFS. A HDFS connection in DSS consists of : a root path, under which all the data accessible through that connection resides.
Apache Hive is an open source data warehouse software for reading, writing and managing large data set files that are stored directly in either the Apache Hadoop Distributed File System (HDFS) or other data storage systems such as Apache HBase.
It is possible to access Hive utilizing C# by making use of Microsoft's ODBC connector. Download the Nuget package for "Microsoft.Hadoop.Hive" and follow the example provided at http://msdn.microsoft.com/en-us/library/dn749834.aspx
The trick lies in building the connection string to connect with it. The best way I came up with was to download the Microsoft Hive ODBC Driver (http://www.microsoft.com/en-us/download/details.aspx?id=40886), install it, then use the Server Explorer inside Visual Studio to add a new connection, then build the connection string for me. To do this, I used the following steps:
Either utilize this data source or copy the connection string that it's built for you and use it within your application.
With Hadoop: there is no straight way to connect from C# because Hadoop communication tier is working with java only and is not cross platform. It is probably possible but in very non-trivial ways. I know there is a patch to add Protocol Buffers support for Hadoop but at the moment of writing (Aug 2011) is is not released yet.
With Hive situation is better because Hive has Thrift interface which supports C#. You can download Hive Thrift interfaces and generate C# client on your own but beware that it requires some hacking of generated code. Instead I would recommend you downloading dll from https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll or use Nuget package manager, search for "hive": http://nuget.org/List/Packages/Hive.Sharp.Lib Disclaimer: I'm the author.
Apparently it is possible to connect to Hadoop with non-Java solutions - see Do I have to write my application in Java?
There is Hortonworks ODBC driver. I havn't used it personally, but it shall let you work with hive as with any other ODBC datasource. You can use OdbcConnection class to connect to Hive once ODBC driver is installed.
As noted in other answers - you can use Thrift api. For that you need to generate C# classes from interface definition files, which you can download from Hive source repository. This approach works for me.
You can use IKVM, to convert hadoop client java libraries into .Net assemblies which you can use from C#. I havn't used IKVM with Hive client, but I've IKVMed some other hadoop client library and surprisingly it worked.
EDIT:
Thrift API is also another way for other language to access hdfs and hive
See if this helps. I have tried to connect to Hadoop via C#
How to communicate to Hadoop via Hive using .NET/C#
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With