What is meant by "HDFS lacks random read and write access"?

2 Answers

The default HDFS block size is 128 MB. So you cannot read one line here, one line there. You always read and write 128 MB blocks. This is fine when you want to process the whole file. But it makes HDFS unsuitable for some applications, like where you want to use an index to look up small records.

HBase on the other hand is great for this. If you want to read a small record, you will only read that small record.

HBase uses HDFS as its backing store. So how does it provide efficient record-based access?

HBase loads the tables from HDFS to memory or local disk, so most reads do not go to HDFS. Mutations are stored first in an append-only journal. When the journal gets large, it is built into an "addendum" table. When there are too many addendum tables, they all get compacted into a brand new primary table. For reads, the journal is consulted first, then the addendum tables, and at last the primary table. This system means that we only write a full HDFS block when we have a full HDFS block's worth of changes.

A more thorough description of this approach is in the Bigtable whitepaper.

answered Oct 19 '22 05:10

Daniel Darabos

In a typical database where the data is stored in tables in RDBMS format you can read or write to any record from any table without having to know what is there in other records. This is called random writing/reading.

But in HDFS data is stored in the file format(generally) rather than table format. So if you are reading/writing its not as easy as is in RDBMS.

answered Oct 19 '22 03:10

tacticurv

Related questions
                            
                                cloudfront redirecting to origin instead of aliasing it
                            
                                How to set a cookie to a specific domain in selenium webdriver with python?
                            
                                Github STAR button on repository page
                            
                                Is there a way to make mock functions "interesting" with ON_CALL?
                            
                                Printing out Haskell's evaluation (rewriting) steps for educational/learning purposes. Is it possible?
                            
                                What is the difference between spark-submit and pyspark?
                            
                                Roboto font in CSS
                            
                                Can't install Visual Studio 2014 CTP on Windows 8.1
                            
                                Repeat an iteration in loop if error occurs
                            
                                AES encryption on large files
                            
                                Scrapy: Extract links and text
                            
                                Will std::vectors inside another vector reallocate when the first vector reallocates?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is meant by "HDFS lacks random read and write access"?

Tags:

hadoop

hbase

hdfs

lovespring

People also ask

2 Answers

Daniel Darabos

tacticurv

Recent Activity

Donate For Us