Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to design Hbase schema?

suppose that I have this RDBM table (Entity-attribute-value_model):

col1: entityID
col2: attributeName
col3: value

and I want to use HBase due to scaling issues.

I know that the only way to access Hbase table is using a primary key (cursor). you can get a cursor for a specific key, and iterate the rows one-by-one .

The issue is, that in my case, I want to be able to iterate on all 3 columns. for example :

  • for a given an entityID I want to get all its attriutes and values
  • for a give attributeName and value I want to all the entitiIDS ...

so one idea I had is to build one Hbase table that will hold the data (table DATA, with entityID as primary index), and 2 "index" tables one with attributeName as a primary key, and the other one with value

each index table will hold a list of pointers (entityIDs) for the DATA table.

Is it a reasonable approach ? or is is an 'abuse' of Hbase concepts ?

In this blog the author say:

HBase allows get operations by primary key and scans (think: cursor) over row ranges. (If you have both scale and need of secondary indexes, don’t worry - Lucene to the rescue! But that’s another post.)

Do you know how Lucene can help ?

-- Yonatan

like image 330
Yonatan Maman Avatar asked Dec 17 '08 16:12

Yonatan Maman


People also ask

What is HBase schema?

HBase is a column-oriented database and the tables in it are sorted by row. The table schema defines only column families, which are the key value pairs.

Which of the following is good design practice of HBase?

15) What is the best practice on deciding the number of column families for HBase table? It is ideal not to exceed the number of columns families per HBase table by 15 because every column family in HBase is stored as a single file, so large number of columns families will be required to read and merge multiple files.

Is HBase good for structured data?

It is well suited for real-time data processing or random read/write access to large volumes of data. Unlike relational database systems, HBase does not support a structured query language like SQL; in fact, HBase isn't a relational data store at all.

What is Rowkey in HBase?

A row key is a unique identifier for the table row. An HBase table is a multi-dimensional map comprised of one or more columns and rows of data. You specify the complete set of column families when you create an HBase table.


1 Answers

Secondary indexes would indeed be useful for many potential applications of HBase, and I believe the developers are in fact looking at it. Checkout http://www.mail-archive.com/[email protected]/msg04801.html.

In the mean time though, if your application data storage can be modelled as a star schema (see http://en.wikipedia.org/wiki/Star_schema) you might like to checkout the solution that Hypertable proposes for secondary index-type needs http://markmail.org/message/rphm4q6cbar2ycgp

like image 100
The D Williams Avatar answered Sep 21 '22 00:09

The D Williams