Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: os.stat().st_size gives different value than du

Tags:

python

linux

I'm creating a utility that will walk through directories and get the sizes of child directories and files for all directories and store the value. However, the sizes aren't computed correctly.

Here's my class, which automatically recurses through all sub-directories:

class directory:
    '''
    Class that automatically traverses directories
    and builds a tree with size info
    '''
    def __init__(self, path, parent=None):

        if path[-1] != '/':
            # Add trailing /
            self.path = path + '/'
        else:
            self.path = path
        self.size = 4096
        self.parent = parent
        self.children = []
        self.errors = []
        for i in os.listdir(self.path):
            try:
                self.size += os.lstat(self.path + i).st_size
                if os.path.isdir(self.path + i) and not os.path.islink(self.path + i):
                    a = directory(self.path + i, self)
                    self.size += a.size
                    self.children.append(a)
            except OSError:
                self.errors.append(path + i)

I have a directory of videos that I'm testing this program with:

>>> a = directory('/var/media/television/The Wire')
>>> a.size
45289964053

However, when I try the same with du, I get

$ du -sx /var/media/television/The\ Wire
44228824

The directories don't contain any links or anything special.

Could someone explain why os.stat() is giving weird size readings?

Platform:

  • Linux (Fedora 13)
  • Python 2.7
like image 783
fandingo Avatar asked Nov 02 '10 17:11

fandingo


1 Answers

Consider this file foo

-rw-rw-r-- 1 unutbu unutbu 25334 2010-10-31 12:55 foo

It consists of 25334 bytes.

tune2fs tells me foo resides on a filesystem with block size 4096 bytes:

% sudo tune2fs -l /dev/mapper/vg1-OS1
...
Block size:               4096
...

Thus, the smallest file on the filesystem will occupy 4096 bytes, even if its contents consist of just 1 byte. As the file grows larger, space is allocated in 4096-byte blocks.

du reports

% du -B1 foo
28672   foo

Note that 28672/4096 = 7. This is saying that foo occupys 7 4096-byte blocks on the filesystem. This is the smallest number of blocks needed to hold 25334 bytes.

% du foo
28  foo

This version of du is just reporting 28672/1024 rounded down.

like image 157
unutbu Avatar answered Nov 07 '22 00:11

unutbu