Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does stat command calculate the blocks of a file?

I am wondering how the stat command calculates the number of blocks for a file. I read this article that says:

The value st_blocks gives the size of the file in 512-byte blocks. (This may be smaller than st_size/512 e.g. when the file has holes.) The value st_blksize gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)

Yet I cannot verify this with my own tests.

My file system is ext3.

The command dumpe2fs -h /dev/sda3 shows:

...
First block: 0
Block size: 4096
Fragment size: 4096
...

Then I run

kent@KentT60:~/Desktop$ stat Email
File: `Email'
Size: 965 Blocks: 8 IO Block: 4096 regular file
Device: 80ah/2058d Inode: 746095 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ kent) Gid: ( 1000/ kent)
Access: 2009-08-11 21:36:36.000000000 +0200
Modify: 2009-08-11 21:36:35.000000000 +0200
Change: 2009-08-11 21:36:35.000000000 +0200

If "Blocks" here means: "how many 512 bytes blocks", the number should be 2, not 8. I thought that the block size of the file system (IO block) is 4k.

If the file system gets the file Email, it will fetch a minimum of 4k from the disk (8 x 512 bytes blocks), which means 965/512 + 6 = 8. I am not sure if this guess is correct.

Another test:

kent@KentT60:~/Desktop$ stat wxPython-demo-2.8.10.1.tar.bz2
File: `wxPython-demo-2.8.10.1.tar.bz2'
Size: 3605257 Blocks: 7056 IO Block: 4096 regular file
Device: 80ah/2058d Inode: 746210 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ kent) Gid: ( 1000/ kent)
Access: 2009-08-12 21:45:45.000000000 +0200
Modify: 2009-08-12 21:43:46.000000000 +0200
Change: 2009-08-12 21:43:46.000000000 +0200


3605257/512=7041.xx = 7042

Following my guess above, this would be 7042 + 6 = 7048. But the stat result shows 7056.

And another example from the internet at https://www.computerhope.com/unix/stat.htm. I pasted the example at the bottom of the page here:

File: `index.htm'
Size: 17137 Blocks: 40 IO Block: 8192 regular file
Device: 8h/8d Inode: 23161443 Links: 1
Access: (0644/-rw-r--r--) Uid: (17433/comphope) Gid: ( 32/ www)
Access: 2007-04-03 09:20:18.000000000 -0600
Modify: 2007-04-01 23:13:05.000000000 -0600
Change: 2007-04-02 16:36:21.000000000 -0600

In this example, the file system block size is 8k. I suppose the "Blocks" value should be 16xN, but it is 40. I'm getting lost...

Can anyone explain how stat calculates the "Blocks" value?

Thanks!

like image 582
Kent Avatar asked Aug 28 '09 12:08

Kent


People also ask

What does the stat command do?

The stat is a command which gives information about the file and filesystem. Stat command gives information such as the size of the file, access permissions and the user ID and group ID, birth time access time of the file. Stat command has another feature, by which it can also provide the file system information.

What are stat blocks?

In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter.

Which block contains statistical information to keep track of the file system?

Superblock. The superblock is 4096 bytes in size and starts at byte offset 4096 on the disk. The super- block maintains information about the entire file system and includes the following fields: Size of the file system.

Which command is used to determine state files?

stat command is a useful utility for viewing file or file system status.


1 Answers

The stat command-line tool uses the stat / fstat etc. functions, which return data in the stat structure. The st_blocks member of the stat structure returns:

The total number of physical blocks of size 512 bytes actually allocated on disk. This field is not defined for block special or character special files.

So for your "Email" example, with a size of 965 and a block count of 8, it is indicating that 8*512=4096 bytes are physically allocated on disk. The reason it's not 2 is that the file system on disk does not allocate space in units of 512, it evidently allocates them in units of 4096. (And the unit of allocation may vary depending on file size and filesystem sophistication. E.g. ZFS supports different units of allocation.)

Similarly, for the wxPython example, it indicates that 7056*512 bytes, or 3612672 bytes are physically allocated on disk. You get the idea.

The IO block size is "a hint as to the 'best' unit size for I/O operations" - it's usually the unit of allocation on the physical disk. Don't get confused between the IO block and the block that stat uses to indicate physical size; the blocks for physical size are always 512 bytes.

Update based on comment:

Like I said, st_blocks is how the OS indicates how much space is used by the file on disk. The actual units of allocation on disk are the choice of the file system. For example, ZFS can have allocation blocks of variable size, even in the same file, because of the way it allocates blocks: files start out having a small block size, and the block size keeps on increasing until it reaches a particular point. If the file is later truncated, it will probably keep the old block size. So based on the history of the file, it can have multiple possible block sizes. So given a file size it is not always obvious why it has a particular physical size.

Concrete example: on my Solaris box, with a ZFS file system, I can create a very short file:

$ echo foo > test
$ stat test
  Size: 4               Blocks: 2          IO Block: 512    regular file
(irrelevant details omitted)

OK, small file, 2 blocks, physical disk usage is 1024 for this file.

$ dd if=/dev/zero of=test2 bs=8192 count=4
$ stat test2
  Size: 32768           Blocks: 65         IO Block: 32768  regular file

OK, now we see physical disk usage of 32.5K, and an IO block size of 32K. I then copied it to test3 and truncated this test3 file in an editor:

$ cp test2 test3
$ joe -hex test3
$ stat test3
  Size: 4               Blocks: 65         IO Block: 32768  regular file

Well now, here's a file with 4 bytes in it - just like test - but it's using 32.5K physically on the disk, because of the way the ZFS file system allocates space. Block sizes increase as the file gets larger, but they don't decrease when the file gets smaller. (And yes, this can lead to substantial wasted space depending on the kinds of files and file operations you do on ZFS, which is why it allows you to set the maximum block size on a per-filesystem basis, and change it dynamically.)

Hopefully, you can now appreciate that there isn't necessarily a simple relationship between file size and physical disk usage. Even in the above it's not clear why 32.5K bytes are needed to store a file that's exactly 32K in size - it appears that ZFS generally needs an extra 512 bytes for extra storage of its own. Perhaps it's using that storage for checksums, reference counts, transaction state - file system bookkeeping. By including these extras in the indicated physical file size, it seems like ZFS is trying not to mislead the user as to the physical costs of the file. That doesn't mean it's trivial to reverse-engineer the calculation without knowing intimate details about the underlying file system implementation.

like image 70
Barry Kelly Avatar answered Sep 28 '22 17:09

Barry Kelly