At the given time I have user file system in my application (apache CMIS). As it's growing bigger, I'm doubting to move to hadoop (HDFS) as we need to run some statistics on it as well. The problem: The current file system provides versioning of the files. When I read about hadoop - HDFS- and file versioning, I found most of the time that I have to write this (versioning) layer myself. Is there already something available to manage versioning of files in HDFS or do I really have to write it myself (don't want to reinvent the hot water, but don't find a proper solution either).
Answer
For full details: see comments on answer(s) below
Hadoop (HDFS) doesn't support versioning of files. You can get this functionality when you combine hadoop with (amazon) S3: Hadoop will use S3 as the filesystem (without chuncks, but recovery will be provided by S3). This solution comes with the versioning of files that S3 provides. Hadoop will still use YARN for the distributed processing.
Namenode directory structure This mechanism provides resilience, particularly if one of the directories is an NFS mount, as is recommended. The VERSION file is a Java properties file that contains information about the version of HDFS that is running.
Your answerYou can not modified data once stored in hdfs because hdfs follows Write Once Read Many model. You can only append the data once stored in hdfs.
How Does HDFS Store Data? HDFS divides files into blocks and stores each block on a DataNode. Multiple DataNodes are linked to the master node in the cluster, the NameNode. The master node distributes replicas of these data blocks across the cluster.
HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file.
Versioning is not possible with HDFS.
Instead you can use Amazon S3, which provides Versioning and is also compatible with Hadoop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With