How does stat command calculate the blocks of a file?

Tags:

filesystems

I am wondering how the stat command calculates the number of blocks for a file. I read this article that says:

The value st_blocks gives the size of the file in 512-byte blocks. (This may be smaller than st_size/512 e.g. when the file has holes.) The value st_blksize gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)

Yet I cannot verify this with my own tests.

My file system is ext3.

The command dumpe2fs -h /dev/sda3 shows:

...
First block: 0
Block size: 4096
Fragment size: 4096
...

Then I run

kent@KentT60:~/Desktop$ stat Email
File: `Email'
Size: 965 Blocks: 8 IO Block: 4096 regular file
Device: 80ah/2058d Inode: 746095 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ kent) Gid: ( 1000/ kent)
Access: 2009-08-11 21:36:36.000000000 +0200
Modify: 2009-08-11 21:36:35.000000000 +0200
Change: 2009-08-11 21:36:35.000000000 +0200

If "Blocks" here means: "how many 512 bytes blocks", the number should be 2, not 8. I thought that the block size of the file system (IO block) is 4k.

If the file system gets the file Email, it will fetch a minimum of 4k from the disk (8 x 512 bytes blocks), which means 965/512 + 6 = 8. I am not sure if this guess is correct.

Another test:

kent@KentT60:~/Desktop$ stat wxPython-demo-2.8.10.1.tar.bz2
File: `wxPython-demo-2.8.10.1.tar.bz2'
Size: 3605257 Blocks: 7056 IO Block: 4096 regular file
Device: 80ah/2058d Inode: 746210 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ kent) Gid: ( 1000/ kent)
Access: 2009-08-12 21:45:45.000000000 +0200
Modify: 2009-08-12 21:43:46.000000000 +0200
Change: 2009-08-12 21:43:46.000000000 +0200


3605257/512=7041.xx = 7042

Following my guess above, this would be 7042 + 6 = 7048. But the stat result shows 7056.

And another example from the internet at https://www.computerhope.com/unix/stat.htm. I pasted the example at the bottom of the page here:

File: `index.htm'
Size: 17137 Blocks: 40 IO Block: 8192 regular file
Device: 8h/8d Inode: 23161443 Links: 1
Access: (0644/-rw-r--r--) Uid: (17433/comphope) Gid: ( 32/ www)
Access: 2007-04-03 09:20:18.000000000 -0600
Modify: 2007-04-01 23:13:05.000000000 -0600
Change: 2007-04-02 16:36:21.000000000 -0600

In this example, the file system block size is 8k. I suppose the "Blocks" value should be 16xN, but it is 40. I'm getting lost...

Can anyone explain how stat calculates the "Blocks" value?

Thanks!

582

asked Aug 28 '09 12:08

1 Answers

The stat command-line tool uses the stat / fstat etc. functions, which return data in the stat structure. The st_blocks member of the stat structure returns:

The total number of physical blocks of size 512 bytes actually allocated on disk. This field is not defined for block special or character special files.

So for your "Email" example, with a size of 965 and a block count of 8, it is indicating that 8*512=4096 bytes are physically allocated on disk. The reason it's not 2 is that the file system on disk does not allocate space in units of 512, it evidently allocates them in units of 4096. (And the unit of allocation may vary depending on file size and filesystem sophistication. E.g. ZFS supports different units of allocation.)

Similarly, for the wxPython example, it indicates that 7056*512 bytes, or 3612672 bytes are physically allocated on disk. You get the idea.

The IO block size is "a hint as to the 'best' unit size for I/O operations" - it's usually the unit of allocation on the physical disk. Don't get confused between the IO block and the block that stat uses to indicate physical size; the blocks for physical size are always 512 bytes.

Update based on comment:

Like I said, st_blocks is how the OS indicates how much space is used by the file on disk. The actual units of allocation on disk are the choice of the file system. For example, ZFS can have allocation blocks of variable size, even in the same file, because of the way it allocates blocks: files start out having a small block size, and the block size keeps on increasing until it reaches a particular point. If the file is later truncated, it will probably keep the old block size. So based on the history of the file, it can have multiple possible block sizes. So given a file size it is not always obvious why it has a particular physical size.

Concrete example: on my Solaris box, with a ZFS file system, I can create a very short file:

$ echo foo > test
$ stat test
  Size: 4               Blocks: 2          IO Block: 512    regular file
(irrelevant details omitted)

OK, small file, 2 blocks, physical disk usage is 1024 for this file.

$ dd if=/dev/zero of=test2 bs=8192 count=4
$ stat test2
  Size: 32768           Blocks: 65         IO Block: 32768  regular file

OK, now we see physical disk usage of 32.5K, and an IO block size of 32K. I then copied it to test3 and truncated this test3 file in an editor:

$ cp test2 test3
$ joe -hex test3
$ stat test3
  Size: 4               Blocks: 65         IO Block: 32768  regular file

Well now, here's a file with 4 bytes in it - just like test - but it's using 32.5K physically on the disk, because of the way the ZFS file system allocates space. Block sizes increase as the file gets larger, but they don't decrease when the file gets smaller. (And yes, this can lead to substantial wasted space depending on the kinds of files and file operations you do on ZFS, which is why it allows you to set the maximum block size on a per-filesystem basis, and change it dynamically.)

Hopefully, you can now appreciate that there isn't necessarily a simple relationship between file size and physical disk usage. Even in the above it's not clear why 32.5K bytes are needed to store a file that's exactly 32K in size - it appears that ZFS generally needs an extra 512 bytes for extra storage of its own. Perhaps it's using that storage for checksums, reference counts, transaction state - file system bookkeeping. By including these extras in the indicated physical file size, it seems like ZFS is trying not to mislead the user as to the physical costs of the file. That doesn't mean it's trivial to reverse-engineer the calculation without knowing intimate details about the underlying file system implementation.

answered Sep 28 '22 17:09

Barry Kelly

Related questions
                            
                                Comparison of GUI development tools for linux [closed]
                            
                                MySQL the command line and pagers
                            
                                How to determine the process memory limit in Linux?
                            
                                why is my c program suddenly using 30g of virtual memory?
                            
                                Possible to pipe into an if-statement?
                            
                                RTNETLINK answers :No such file or directory error
                            
                                linux curl save as utf-8
                            
                                How to create 4KB Linux binaries that render a 3D scene?
                            
                                What will happen when I edit a script while it's running?
                            
                                /proc/$pid/maps shows pages with no rwx permissions on x86_64 linux
                            
                                Error: Could not find or load main class xxx Linux
                            
                                What is EOF!! in the bash script?
                            
                                Where is Linux CFS Scheduler Code?
                            
                                In socket programming, why a client is not bind to an address?
                            
                                Detect the presence of a device when it's hot plugged in Linux
                            
                                how to create docker overlay network between multi hosts?
                            
                                How to use getopts option without argument at the end in bash
                            
                                Reset bash history search position
                            
                                How to be sure ClamAV database is up to date?
                            
                                Pass file content to docker exec

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does stat command calculate the blocks of a file?

Tags:

linux

filesystems

Kent

People also ask

1 Answers

Barry Kelly

Recent Activity

Donate For Us