I have to join tables in Hbase.
I integrated HIVE and HBase and that is working well. I can query using HIVE.
But can somebody help me how to join tables in HBase without using HIVE. I think using mapreduce we can achieve this, if so can anybody share a working example that I can refer.
Please share your opinions.
I have an approach in mind. That is,
If I need to JOIN tables A x B x C; I may use TableMapReduceUtil to iterate over A, then get Data from B and C inside the TableMapper. Then use the TableReducer to write back to another table Y.
Will this approach be a good one.
That is certainly an approach, but if you are doing 2 random reads per scanned row then your speed will plummet. If you are filtering the rows out significantly or have a small dataset in A that may not be an issue.
However the best approach, which will be available in HBase 0.96, is the MultipleTableInput method. This means that it will scan table A and write it's output with a unique key that will allow table B to match up.
E.g. Table A emits (b_id, a_info) and Table B will emit (b_id, b_info) merging together in the reducer.
This is an example of a sort-merge join.
If you are joining on the row key or the joining attribute is sorted in line with table B, you can have a instance of a scanner in each task which sequentially reads from table B until it finds what it's looking for.
E.g. Table A row key = "companyId" and Table B row key = "companyId_employeeId". Then for each Company in Table A you can get all the employees using the nest-loop algorithm.
for(company in TableA):
for(employee in TableB):
if employee.company_id == company.id:
emit(company.id, employee)
This is an example of a nest-loop join.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With