Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is use of hcatalog in hadoop?

Tags:

I'm new to Hadoop. I know that the HCatalog is a table and storage management layer for Hadoop. But how exactly it works and how to use it. Please give some simple example.

like image 921
Vijay_Shinde Avatar asked Mar 20 '14 13:03

Vijay_Shinde


People also ask

What is the purpose of HCatalog?

HCatalog is a tool that allows you to access Hive metastore tables within Pig, Spark SQL, and/or custom MapReduce applications. HCatalog has a REST interface and command line client that allows you to create tables or do other operations. You then write your applications to access the tables using HCatalog libraries.

What kind of data does HCatalog hold?

By default, HCatalog supports RCFile, CSV, JSON, SequenceFile, and ORC file formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.

Why do we need Metastore?

Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables (like their schema and location) and partitions in a relational database. It provides client access to this information by using metastore service API.

Which of the following options are the applications of HCatalog?

Applications of HCatalog As Hive has reigned as the defacto SQL interface for Hadoop, since 2008, because it offers a relational view through SQL like language to data within Hadoop. Now, this same interface is published by HCatalog though it abstracts for data beyond Hive.


2 Answers

In short, HCatalog opens up the hive metadata to other mapreduce tools. Every mapreduce tools has its own notion about HDFS data (example Pig sees the HDFS data as set of files, Hive sees it as tables). With having table based abstraction, HCatalog supported mapreduce tools do not need to care about where the data is stored, in which format and storage location (HBase or HDFS).

We do get the facility of WebHcat to submit jobs in an RESTful way if you configure webhcat along Hcatalog.

like image 86
Prabu Soundar Rajan Avatar answered Oct 19 '22 18:10

Prabu Soundar Rajan


Here is a very basic example of how ho use HCATALOG.

I have a table in hive ,TABLE NAME is STUDENT which is stored in one of the HDFS location:

neethu 90 malini 90 sunitha 98 mrinal 56 ravi 90 joshua 8

Now suppose I want to load this table to pig for further transformation of data, In this scenario I can use HCATALOG:

When using table information from the Hive metastore with Pig, add the -useHCatalog option when invoking pig:

pig -useHCatalog

(you may want to export HCAT_HOME 'HCAT_HOME=/usr/lib/hive-hcatalog/')

Now loading this table to pig: A = LOAD 'student' USING org.apache.hcatalog.pig.HCatLoader();

Now you have loaded the table to pig.To check the schema , just do a DESCRIBE on the relation.

DESCRIBE A

Thanks

like image 34
Neethu Lalitha Avatar answered Oct 19 '22 20:10

Neethu Lalitha