Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Used space calculated by statvfs() for a file system is greater than the sum of the sizes of all files in the fs

I have a little partition of 50MiB, formatted as ext4, with only one directory that contains a set of photos, mounted on /mnt/tmp.

Then I use statvfs() for calculate the used bytes in the partition, and lstat() for calculate the size of every file inside, for this I wrote this program:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/statvfs.h>
#include <stdint.h>
#include <string.h>
#include <dirent.h>
#include <stdlib.h>

//The amount of bytes of all files found
uint64_t totalFilesSize=0;

//Size for a sector in the fs
unsigned int sectorSize=0;

void readDir(char *path) {
    DIR *directory;
    struct dirent *d_file;  // a file in *directory

    directory = opendir (path);

    while ((d_file = readdir (directory)) != 0)
    {
        struct stat filestat;
        char *abPath=malloc(1024);
        memset(abPath, 0, 1024);
        strcpy(abPath, path);
        strcat(abPath, "/");
        strcat(abPath, d_file->d_name);

        lstat (abPath, &filestat);

        switch (filestat.st_mode & S_IFMT)
        {
        case S_IFDIR:
        {
            if (strcmp (".", d_file->d_name) && strcmp ("..", d_file->d_name))
            {
                printf("File: %s\nSize: %d\n\n", abPath, filestat.st_size);

                //Add slack space to the final sum
                int slack=sectorSize-(filestat.st_size%sectorSize);

                totalFilesSize+=filestat.st_size+slack;

                readDir(abPath);
            }
            break;
        }
        case S_IFREG:
        {
            printf("File: %s\nSize: %d\n\n", abPath, filestat.st_size);

            //Add slack space to the final sum
            int slack=sectorSize-(filestat.st_size%sectorSize);

            totalFilesSize+=filestat.st_size+slack;

            break;
        }
        }

        free(abPath);
    }

    closedir (directory);
}

int main (int argc, char **argv) {

    if(argc!=2) {
        printf("Error: Missing required parameter.\n");
        return -1;
    }

    struct statvfs info;
    statvfs (argv[1], &info);

    sectorSize=info.f_bsize; //Setting global variable

    uint64_t usedBytes=(info.f_blocks-info.f_bfree)*info.f_bsize;

    readDir(argv[1]);

    printf("Total blocks: %d\nFree blocks: %d\nSize of block: %d\n\
Size in bytes: %d\nTotal Files size: %d\n",
            info.f_blocks, info.f_bfree, info.f_bsize, usedBytes, totalFilesSize);

    return 0;
}

Passing the mount point of the partition as parameter (/mnt/tmp), the program shows this output:

File: /mnt/tmp/lost+found
Size: 12288

File: /mnt/tmp/photos
Size: 1024

File: /mnt/tmp/photos/IMG_3195.JPG
Size: 2373510

File: /mnt/tmp/photos/IMG_3200.JPG
Size: 2313695

File: /mnt/tmp/photos/IMG_3199.JPG
Size: 2484189

File: /mnt/tmp/photos/IMG_3203.JPG
Size: 2494687

File: /mnt/tmp/photos/IMG_3197.JPG
Size: 2259056

File: /mnt/tmp/photos/IMG_3201.JPG
Size: 2505596

File: /mnt/tmp/photos/IMG_3202.JPG
Size: 2306304

File: /mnt/tmp/photos/IMG_3204.JPG
Size: 2173883

File: /mnt/tmp/photos/IMG_3198.JPG
Size: 2390122

File: /mnt/tmp/photos/IMG_3196.JPG
Size: 2469315

Total blocks: 47249
Free blocks: 19160
Size of block: 1024
Size in bytes: 28763136
Total Files size: 23790592

Note at the last two lines. In a FAT32 file system, the amount is the same, but in ext4 differs.

So question is: Why?

like image 778
jlledom Avatar asked Dec 31 '11 14:12

jlledom


2 Answers

statvfs() is a filesystem-level operation. The space used will be calculated from the point of view of the filesystem. Therefore:

  1. It will contain any filesystem structures: For filesystems based on the traditional design from Unix, that includes the inodes and any indirect blocks.

    On some of my systems I typically have a 256-byte inode per 32KB of space for the root partition. Smaller partitions may have even higher inode density, to provide sufficient inodes for a large number of files - I believe that the mke2fs default is one inode per 16KB of space.

    Creating an 850 MB Ext4 filesystem with the default options results in a filesystem with about 54,000 inodes that consume over 13MB of space.

  2. For Ext3/Ext4 that will also include the journal, which has a minimum size of 1024 filesystem blocks. For the common block size of 4KB that is a minimum of 4MB per filesystem.

    An 850 MB Ext4 filesystem will have a 16MB journal by default.

  3. The result from statvfs() will also include any deleted, yet still open, files - this often happens on partitions housing tmp directories for use by applications.

  4. To see the actual space used by a file with lstat(), you need to use the st_blocks field of the stat structure and multiply with 512. Judging by the sizes displayed in your program output, you are using the st_size field which is the exact file size in bytes. This will typically be smaller than the actual space used - a 5KB file will actually use 8KB on a filesystem with 4KB blocks.

    Conversely, a sparse file will use less blocks than what is indicated by its file size.

As such, the additional space usage mentioned above will add-up to rather noticeable amounts, which explain the discrepancy that you are seeing.

EDIT:

  1. I just noticed the slack space handling in your program. Although that is not the recommended way to calculate the actual used space (as opposed to the apparent one), it seems to work, so you are not missing space there. On the other hand, you are missing the space used for the root directory of the filesystem, although that would probably be only a single block or two :-)

  2. You might want to have a look at the output of tune2fs -l /dev/xxx. It lists several relevant numbers, including space reserved for filesystem metadata.

BTW, most of the functionality in your program can be accomplished using df and du:

# du -a --block-size=1 mnt/
2379776 mnt/img0.jpg
3441664 mnt/img1.jpg
2124800 mnt/img2.jpg
12288   mnt/lost+found
7959552 mnt/
# df -B1 mnt/
Filesystem     1B-blocks     Used Available Use% Mounted on
/dev/loop0      50763776 12969984  35172352  27% /tmp/mnt

Incidentally, the Ext4 test filesystem displayed above was created using the default mkfs options on a 50MB image file. It has a block size of 1,024 bytes, 12,824 128-byte inodes which consume 1,603 KB and a 4096-block journal that uses 4,096KB. A further 199 blocks are reserved for the group descriptor tables, according to tune2fs.

like image 170
thkala Avatar answered Sep 29 '22 07:09

thkala


The inodes are probably not counted, and they may contain some small data.

If a file is sparse, its size is bigger than what it is actually occupied.

If a file is hard-linked more than once, a common inode is shared.

A paper about Ext4 is here, by Kumar et al

like image 43
Basile Starynkevitch Avatar answered Sep 29 '22 07:09

Basile Starynkevitch