A colleague of mine thinks that HDFS has no maximum file size, i.e., by partitioning into 128 / 256 meg chunks any file size can be stored (obviously the HDFS disk has a size and that will limit, but is that the only limit). I can't find anything saying that there is a limit so is she correct?
thanks, jim
For maximum file size, you cannot do much except for a block size (each file having multiple blocks). There is no limit to a file size.
To store 100 files i.e. 100 MB data we need to make use of 15 x 100 = 1500 bytes of memory in Name Node RAM memory. Consider another file “IdealFile” of size 100 MB, we need one block here i.e. B1 that is stored in Machine 1, Machine 2 , Machine 3. This will occupy 150 MB memory in Name Node RAM.
Files in HDFS are broken into block-sized chunks called data blocks. These blocks are stored as independent units. The size of these HDFS data blocks is 128 MB by default.
The size of the data block in HDFS is 64 MB by default, which can be configured manually.
Well there is obviously a practical limit. But physically HDFS Block IDs are Java longs so they have a max of 2^63 and if your block size is 64 MB then the maximum size is 512 yottabytes.
I think she's right about saying there's no maximum file size on HDFS. The only thing you can really set is the chunk size, which is 64 MB by default. I guess sizes of any length can be stored, the only constraint could be that the bigger the size of the file, the greater the hardware to accommodate it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With