suppose that I have this RDBM table (Entity-attribute-value_model):
col1: entityID
col2: attributeName
col3: value
and I want to use HBase due to scaling issues.
I know that the only way to access Hbase table is using a primary key (cursor). you can get a cursor for a specific key, and iterate the rows one-by-one .
The issue is, that in my case, I want to be able to iterate on all 3 columns. for example :
so one idea I had is to build one Hbase table that will hold the data (table DATA, with entityID as primary index), and 2 "index" tables one with attributeName as a primary key, and the other one with value
each index table will hold a list of pointers (entityIDs) for the DATA table.
Is it a reasonable approach ? or is is an 'abuse' of Hbase concepts ?
In this blog the author say:
HBase allows get operations by primary key and scans (think: cursor) over row ranges. (If you have both scale and need of secondary indexes, don’t worry - Lucene to the rescue! But that’s another post.)
Do you know how Lucene can help ?
-- Yonatan
HBase is a column-oriented database and the tables in it are sorted by row. The table schema defines only column families, which are the key value pairs.
15) What is the best practice on deciding the number of column families for HBase table? It is ideal not to exceed the number of columns families per HBase table by 15 because every column family in HBase is stored as a single file, so large number of columns families will be required to read and merge multiple files.
It is well suited for real-time data processing or random read/write access to large volumes of data. Unlike relational database systems, HBase does not support a structured query language like SQL; in fact, HBase isn't a relational data store at all.
A row key is a unique identifier for the table row. An HBase table is a multi-dimensional map comprised of one or more columns and rows of data. You specify the complete set of column families when you create an HBase table.
Secondary indexes would indeed be useful for many potential applications of HBase, and I believe the developers are in fact looking at it. Checkout http://www.mail-archive.com/[email protected]/msg04801.html.
In the mean time though, if your application data storage can be modelled as a star schema (see http://en.wikipedia.org/wiki/Star_schema) you might like to checkout the solution that Hypertable proposes for secondary index-type needs http://markmail.org/message/rphm4q6cbar2ycgp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With