Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does HBase impose a maximum size per row?

Tags:

hbase

mapr

High-Level Question:

Does HBase impose a maximum size per row which is common to all distributions (and thus not an artifact of implementation), either in terms of bytes-stored or in terms of number of cells?

If so:

  • What is the limit?

  • What is the reason the limit exists?

  • Where is the limit documented?

If not:

  • Is documentation (or results of a test) available demonstrating the ability of HBase to handle rows in excess of 2GB? 4GB?

  • Is there a practical or "best practice" maximum under which HBase API users should keep row sizes in order to avoid severe performance degradation? If so, what kind of performance degradation can occur if that guidance is discarded?

In either case:

  • Does the answer depend on the HBase version in question?

Background:

  • At least one implementation of the HBase API does appear to impose a limit; MapR Tables, which uses MapR's proprietary MapR-FS as the storage layer underlying the tables, appears to impose a hard limit of 2GB per row and a configurable soft limit which defaults to 32MB. Do other popular implementations of the HBase API also impose such a restriction?
  • This Quora response from HBase committer Todd Lipcon in 2011 suggests the absence of a limit in terms of number of cells. However, it also indicates that "the unit of load balancing and distribution is the region, and a row will never be split across regions". Does the requirement that a row exist within a single region impose either a hard limit on the row size, or a practical limit, past which performance degradation becomes severe?
like image 948
sumitsu Avatar asked Dec 25 '22 04:12

sumitsu


1 Answers

One row must be fit into one Region file to be assigned to a region server and replicated. Region file size is configurable by "hbase.hregion.max.filesize"

this page says it will be 10gb default/max http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

This page says it can be set as 100gb

To disable automatic splitting, set hbase.hregion.max.filesize to a very large value, such as 100 GB It is not recommended to set it to its absolute maximum value of Long.MAX_VALUE. http://hbase.apache.org/book.html#important_configurations

like image 124
halil Avatar answered Feb 04 '23 02:02

halil