Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up HBase read response

I have 4 nodes HBase v0.90.4-cdh3u3 cluster deployed on Amazon XLarge instances (16Gb RAM, 4 cores CPU) with 8Gb heap -Xmx allocated for HRegion servers, 2Gb for datanodes. HMaster\ZK\Namenode is on the separate XLarge instance. Target dataset is 100 millions records (each record is 10 fields by 100 bytes). Benchmarking performed concurrently from parallel 100 threads.

I'm confused with a read latency I got, comparing to what YCSB team achieved and showed in their YCSB paper. They achieved throughput of up to 7000 ops/sec with a latency of 15 ms (page 10, read latency chart). I can't get throughput higher than 2000 ops/sec on 90% reads/10% writes workload. Writes are really fast with auto commit disabled (response within a few ms), while read latency doesn't go lower than 70 ms in average.

These are some HBase settings I used:

  • hbase.regionserver.handler.count=50
  • hfile.block.cache.size=0.4
  • hbase.hregion.max.filesize=1073741824
  • hbase.regionserver.codecs=lzo
  • hbase.hregion.memstore.mslab.enabled=true
  • hfile.min.blocksize.size=16384
  • hbase.hregion.memstore.block.multiplier=4
  • hbase.regionserver.global.memstore.upperLimit=0.35
  • hbase.zookeeper.property.maxClientCnxns=100

Which settings do you recommend to look at\tune to speed up reads with HBase?

like image 860
S B Avatar asked Apr 06 '12 07:04

S B


Video Answer


1 Answers

Upgrading to a newer stable version will help. Anything 0.92+ will have the newer HFile v2 which can really help.

  • 0.94 has been release and had a few point releases.
  • If you prefer a CDH build CDH 4.1 has a 0.92.1 based HBase.

Creating the table pre-split with bloom filters enabled can really help. I would try lowering the number of handlers a little bit. http://archive.cloudera.com/cdh4/cdh/4/hbase/book.html#perf.handlers

Read latency of 70ms is really far off of what I would expect. Look into gc tuning and make sure that all of your RegionServers are running and have regions for the table you are trying to benchmark.

like image 187
eclark Avatar answered Oct 02 '22 02:10

eclark