How does HDFS with append works

Tags:

Let's assume one is using default block size (128 MB), and there is a file using 130 MB ; so using one full size block and one block with 2 MB. Then 20 MB needs to be appended to the file (total should be now of 150 MB). What happens?

Does HDFS actually resize the size of the last block from 2MB to 22MB? Or create a new block?

How does appending to a file in HDFS deal with conccurency? Is there risk of dataloss ?

Does HDFS create a third block put the 20+2 MB in it, and delete the block with 2MB. If yes, how does this work concurrently?

594

asked Feb 06 '12 15:02

David

1 Answers

According to the latest design document in the Jira issue mentioned before, we find the following answers to your question:

HDFS will append to the last block, not create a new block and copy the data from the old last block. This is not difficult because HDFS just uses a normal filesystem to write these block-files as normal files. Normal file systems have mechanisms for appending new data. Of course, if you fill up the last block, you will create a new block.
Only one single write or append to any file is allowed at the same time in HDFS, so there is no concurrency to handle. This is managed by the namenode. You need to close a file if you want someone else to begin writing to it.
If the last block in a file is not replicated, the append will fail. The append is written to a single replica, who pipelines it to the replicas, similar to a normal write. It seems to me like there is no extra risk of dataloss as compared to a normal write.

197

answered Jan 02 '23 23:01

EthanP

Related questions
                            
                                Hive creating a table but getting FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
                            
                                High throughput vs low latency in HDFS
                            
                                terminating a spark step in aws
                            
                                Hadoop: compress file in HDFS?
                            
                                How to delete duplicate records from Hive table?
                            
                                Wiping out DFS in Hadoop
                            
                                Hadoop FileSystem closed exception when doing BufferedReader.close()
                            
                                Redux: How do I get Jython to use Python modules stored in Lib within its own jar file when running in Hadoop?
                            
                                HBase & Mahout - Using HBase as a Datastore/source for Mahout - Classification
                            
                                Spark - Container is running beyond physical memory limits
                            
                                How to balance my data across the partitions?
                            
                                Apache Spark YARN mode startup takes too long (10+ secs)
                            
                                What's the successor of mrunit?
                            
                                Amazon S3 architecture [closed]
                            
                                HDFS replication factor
                            
                                java.io.IOException: Incomplete HDFS URI, no host
                            
                                Generate metadata for parquet files
                            
                                hbase connection refused
                            
                                Apache Spark on YARN: Large number of input data files (combine multiple input files in spark)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does HDFS with append works

Tags:

append

size

hadoop

block

hdfs

David

People also ask

1 Answers

EthanP

Recent Activity

Donate For Us