Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop dfs replicate

Tags:

hadoop

hdfs

Sorry guys,just a simple question but I cannot find exact question on google. The question about what's dfs.replication mean? If I made one file named filmdata.txt in hdfs, if I set dfs.replication=1,so is it totally one file(one filmdata.txt)?or besides the main file(filmdata.txt) hadoop will create another replication file. shortly say:if set dfs.replication=1,there are totally one filmdata.txt,or two filmdata.txt? Thanks in Advance

like image 553
Jack Avatar asked Oct 11 '12 08:10

Jack


2 Answers

The total number of files in the file system will be what's specified in the dfs.replication factor. So, if you set dfs.replication=1, then there will be only one copy of the file in the file system.

Check the Apache Documentation for the other configuration parameters.

like image 175
Praveen Sripati Avatar answered Sep 22 '22 00:09

Praveen Sripati


To ensure high availability of data, Hadoop replicates the data.

When we are storing the files into HDFS, hadoop framework splits the file into set of blocks( 64 MB or 128 MB) and then these blocks will be replicated across the cluster nodes.The configuration dfs.replication is to specify how many replications are required.

The default value for dfs.replication is 3, But this is configurable depends on your cluster setup.

Hope this helps.

like image 42
Ramana Avatar answered Sep 21 '22 00:09

Ramana