Is it better to use HBase columns or serialize data using Avro?

1 Answers

In summary, I lean towards using distinct columns per key.

1) Obviously, you are imposing that the client uses Avro/Thrift, which is another dependency. This dependency means you may remove the possibility of certain tooling, like BI tools which expect to find values in the data without transformation.

2) Under the avro/thrift scheme, you are pretty much forced to bring the entire value across the wire. Depending on how much data is in a row, this may not matter. But if you are only interested in 'city' fields/column-qualifier, you still have to get 'payments', 'credit-card-info', etc. This may also pose a security issue.

3) Updates, if required, will be more challenging with Avro/Thrift. Example: you decide to add a 'hasIphone6' key. Avro/Thrift: You will be forced to delete the row and create a new one with the added field. Under the column scheme, a new entry is appended, with only the new column. For a single row, not big, but if you do this to a billion rows, there will need to be a big compaction operation.

4) If configured, you can use compression in HBase, which may exceed the avro/thrift serialization, since it can compress across a column family, instead of just for the single record.

5) BigTable implementations like HBase do very well with very wide, sparse tables, so there won't be a performance hit like you might expect.

130

answered Nov 02 '22 01:11

cmonkey

Related questions
                            
                                Waiting for multiple SwingWorkers
                            
                                Make a unique list of objects Java
                            
                                Play Framework 2.0 form helper: from checkbox to List<String>
                            
                                volatile synchronized combination for performance
                            
                                Programmatically grant Permissions without using policy file
                            
                                Where System.out writes in a servlet?
                            
                                Determine JRE architecture 32-bit vs 64-bit [duplicate]
                            
                                how to get all instances with a tag under my amazon account using aws java sdk
                            
                                JTree update nodes without collapsing
                            
                                What is the best way to create temporary files on Android?
                            
                                String empty constructor in java [closed]
                            
                                How to map a string to DB sequence in Hibernate
                            
                                Simpler use of TypeAdapterFactory
                            
                                using JarJar repackaging tool
                            
                                How to define constants/final variables in abstract superclasses but assign them in the subclass?
                            
                                Why does post-increment work on wrapper classes
                            
                                How to get current content view in Android programming?
                            
                                How to resolve FATAL: connection limit exceeded for non-superusers
                            
                                How is physical memory organized between base and derived class instances in Java?
                            
                                Unable to catch JsonMappingException

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it better to use HBase columns or serialize data using Avro?

Tags:

java

hbase

Shawn H

People also ask

1 Answers

cmonkey

Recent Activity

Donate For Us