Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading csv data into Hbase [closed]

Tags:

hadoop

hbase

I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.

I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.

The columns are: loan_number, borrower_name, current_distribution_date, loan_amount

I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm

What I'm missing is:

Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?

I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce

I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.

like image 212
bjoern Avatar asked Dec 17 '12 00:12

bjoern


People also ask

What type of data can HBase store?

There are no data types in HBase; data is stored as byte arrays in the cells of HBase table. The content or the value in cell is versioned by the timestamp when the value is stored in the cell. So each cell of an HBase table may contain multiple versions of data.


1 Answers

There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:

1) Use HBase tools like importtsv and completebulkload http://hbase.apache.org/book/arch.bulk.load.html

2) Use Pig to bulk load data. Example:

A = LOAD '/hbasetest.txt' USING PigStorage(',') as 
      (strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
              'mycf:intdata');

3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.

4) Do it programatically using a MapReduce job like in the example you mentioned.

like image 113
Diego Pino Avatar answered Oct 06 '22 01:10

Diego Pino