I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.
I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.
The columns are: loan_number, borrower_name, current_distribution_date, loan_amount
I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm
What I'm missing is:
Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?
I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce
I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.
There are no data types in HBase; data is stored as byte arrays in the cells of HBase table. The content or the value in cell is versioned by the timestamp when the value is stored in the cell. So each cell of an HBase table may contain multiple versions of data.
There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:
1) Use HBase tools like importtsv
and completebulkload
http://hbase.apache.org/book/arch.bulk.load.html
2) Use Pig to bulk load data. Example:
A = LOAD '/hbasetest.txt' USING PigStorage(',') as
(strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'mycf:intdata');
3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.
4) Do it programatically using a MapReduce job like in the example you mentioned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With