In the HBase shell, the help file shows us that there are several allowable syntaxes for creating a table:
create 'tableName', {NAME => 'colFamily', VERSIONS => 5 }
create 'tableName', {NAME => 'cf1'}, {NAME => 'cf2'}
create 'tableName', 'cf1', 'cf2', 'cf3'
create 'tableName', 'cf1', {SPLITS => ['10','20','30','40']}
I want to make a table where I specify both a Split and a some table options, like COMPRESSION => 'SNAPPY' and VERSIONS, but I can't seem to figure out the syntax or find useful documentation.
Apache HBase distributes its load through region splitting. HBase stored rows in the tables and each table is split into 'regions'. Those regions are distributed across the cluster, hosted and made available to client processes by the RegionServer process in the system.
SPLITS (split-point1, split-point2, ..., split-pointN) Provides a starting value for the first column that is stored in the HBase row key at which a split point begins. The total number of regions that are created is the number of split keys plus one.
What eventually became clear after experimentation was that the Shell syntax will accept a set of Column Family dictionaries, and the SPLIT dictionary is really its own animal (which makes sense as it modifies the whole table, not just a particular column family.
So an additional useful example to have would be:
create 'tableName', {NAME => 'colFam', VERSIONS => 2, COMPRESSION => 'SNAPPY'},
{SPLITS => ['333','666','FOO']}
Note that the splits dictionary is separate from the column family dictionary; presumably we could still enter a set of column families and then end with the Splits dictionary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With