What is the maximum number of files and directories allowed in a HDFS (hadoop) directory?

In modern Apache Hadoop versions, various HDFS limits are controlled by configuration properties with <code>fs-limits</code> in the name, all which have reasonable default values. This question specifically asked about number of children in a directory. That's defined by <code>dfs.namenode.fs-limits.max-directory-items</code>, and its default value is <code>1048576</code>. Refer to the Apache Hadoop documentation in hdfs-default.xml for the full list of <code>fs-limits</code> configuration properties and their default values. Copy-pasting here for convenience: <pre class="prettyprint"><code><property> <name>dfs.namenode.fs-limits.max-component-length</name> <value>255</value> <description>Defines the maximum number of bytes in UTF-8 encoding in each component of a path. A value of 0 will disable the check.</description> </property> <property> <name>dfs.namenode.fs-limits.max-directory-items</name> <value>1048576</value> <description>Defines the maximum number of items that a directory may contain. Cannot set the property to a value less than 1 or more than 6400000.</description> </property> <property> <name>dfs.namenode.fs-limits.min-block-size</name> <value>1048576</value> <description>Minimum block size in bytes, enforced by the Namenode at create time. This prevents the accidental creation of files with tiny block sizes (and thus many blocks), which can degrade performance.</description> </property> <property> <name>dfs.namenode.fs-limits.max-blocks-per-file</name> <value>1048576</value> <description>Maximum number of blocks per file, enforced by the Namenode on write. This prevents the creation of extremely large files which can degrade performance.</description> </property> <property> <name>dfs.namenode.fs-limits.max-xattrs-per-inode</name> <value>32</value> <description> Maximum number of extended attributes per inode. </description> </property> <property> <name>dfs.namenode.fs-limits.max-xattr-size</name> <value>16384</value> <description> The maximum combined size of the name and value of an extended attribute in bytes. It should be larger than 0, and less than or equal to maximum size hard limit which is 32768. </description> </property> </code></pre> All of these settings use reasonable default values as decided upon by the Apache Hadoop community. It is generally recommended that users do not tune these values except in very unusual circumstances.

What is the maximum number of files allowed in a HDFS directory?

2 Answers

In modern Apache Hadoop versions, various HDFS limits are controlled by configuration properties with fs-limits in the name, all which have reasonable default values. This question specifically asked about number of children in a directory. That's defined by dfs.namenode.fs-limits.max-directory-items, and its default value is 1048576.

Refer to the Apache Hadoop documentation in hdfs-default.xml for the full list of fs-limits configuration properties and their default values. Copy-pasting here for convenience:

<property>
  <name>dfs.namenode.fs-limits.max-component-length</name>
  <value>255</value>
  <description>Defines the maximum number of bytes in UTF-8 encoding in each
      component of a path.  A value of 0 will disable the check.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-directory-items</name>
  <value>1048576</value>
  <description>Defines the maximum number of items that a directory may
      contain. Cannot set the property to a value less than 1 or more than
      6400000.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.min-block-size</name>
  <value>1048576</value>
  <description>Minimum block size in bytes, enforced by the Namenode at create
      time. This prevents the accidental creation of files with tiny block
      sizes (and thus many blocks), which can degrade
      performance.</description>
</property>

<property>
    <name>dfs.namenode.fs-limits.max-blocks-per-file</name>
    <value>1048576</value>
    <description>Maximum number of blocks per file, enforced by the Namenode on
        write. This prevents the creation of extremely large files which can
        degrade performance.</description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-xattrs-per-inode</name>
  <value>32</value>
  <description>
    Maximum number of extended attributes per inode.
  </description>
</property>

<property>
  <name>dfs.namenode.fs-limits.max-xattr-size</name>
  <value>16384</value>
  <description>
    The maximum combined size of the name and value of an extended attribute
    in bytes. It should be larger than 0, and less than or equal to maximum
    size hard limit which is 32768.
  </description>
</property>

All of these settings use reasonable default values as decided upon by the Apache Hadoop community. It is generally recommended that users do not tune these values except in very unusual circumstances.

102

answered Oct 20 '22 19:10

Chris Nauroth

From http://blog.cloudera.com/blog/2009/02/the-small-files-problem/:

Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible.

answered Oct 20 '22 19:10

David Medinets

Related questions
                            
                                How to auto-respond to fsck prompt at boot
                            
                                Checking if a folder exists
                            
                                unbuffered I/O in Linux
                            
                                Concurrent file access in Android
                            
                                Laravel 5 how to add prefix to S3 file storage config?
                            
                                Is ctime always <= mtime?
                            
                                SMB/samba support on iOS?
                            
                                NodeJS writeStream empty file
                            
                                Save PNG Canvas Image to HTML5 Storage (JAVASCRIPT)?
                            
                                How do I resolve LNK1104 error with Boost Filesystem Library in MSCV?
                            
                                Monitoring mount point changes via /proc/mounts
                            
                                what is the use of fs.open() in nodejs, what is difference between fs.readfile and fs.open()
                            
                                Resolve symlinks with boost filesystem
                            
                                What is the difference between fsync and syncfs?
                            
                                Speeding up file system access?
                            
                                C# Canonical file names [duplicate]
                            
                                Gradle task to rename file
                            
                                Do any common OS file systems use hashes to avoid storing the same content data more than once?
                            
                                make mac package/bundle programmatically
                            
                                Is it OK (performance-wise) to have hundreds or thousands of files in the same Linux directory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the maximum number of files allowed in a HDFS directory?

Tags:

filesystems

hadoop

hdfs

Joe Hansen

People also ask

2 Answers

Chris Nauroth

David Medinets

Recent Activity

Donate For Us