I am using the shutil.disk_usage() function to find the current disk usage of a particular path (amount available, used, etc.). As far as I can find, this is a wrapper around os.statvfs() calls. I'm finding that it is not giving the answers I'd expect, as comparing to the output of "du" in Linux.
I have obscured some of the paths below for company privacy reasons, but the output and code are otherwise undoctored. I am using Python 3.3.2 64-bit version.
#!/apps/python/3.3.2_64bit/bin/python3
# test of shutils.diskusage module
import shutil
BytesPerGB = 1024 * 1024 * 1024
(total, used, free) = shutil.disk_usage("/data/foo/")
print ("Total: %.2fGB" % (float(total)/BytesPerGB))
print ("Used: %.2fGB" % (float(used)/BytesPerGB))
(total1, used1, free1) = shutil.disk_usage("/data/foo/utils/")
print ("Total: %.2fGB" % (float(total1)/BytesPerGB))
print ("Used: %.2fGB" % (float(used1)/BytesPerGB))
Which outputs:
/data/foo/drivecode/me % disk_usage_test.py
Total: 609.60GB
Used: 291.58GB
Total: 609.60GB
Used: 291.58GB
As you can see, the main problem is I would expect the second amount for "Used" to be much smaller, as it is a subset of the first directory.
/data/foo/drivecode/me % du -sh /data/foo/utils
2.0G /data/foo/utils
As much as I trust "du," I find it hard to believe the Python module would be incorrect either. So perhaps it is just my understanding of Linux filesystems that could be the issue. :)
I wrote a module (based heavily on someone's code here at SO) which recursively gets the disk_usage, which I was using until now. It appears to match the "du" output but is MUCH, much slower than the shutil.disk_usage() function, so I'm hoping I can make that one work.
Thanks much in advance.
The problem is that shutil uses the statvfs
system call underneath to determine the space used. This system call has no file-path granularity as far as I'm aware, only file-system granularity. What this means is that the path you provide it with only helps to identify the file system you want to query, not the path's.
In other words, you gave it the path /data/foo/utils
and then it determined which file system backs this file path. Then it queried the file system. This becomes apparent when you consider how the used
parameter is defined in shutil:
used = (st.f_blocks - st.f_bfree) * st.f_frsize
Where:
fsblkcnt_t f_blocks; /* size of fs in f_frsize units */
fsblkcnt_t f_bfree; /* # free blocks */
unsigned long f_frsize; /* fragment size */
This is why it's giving you the total space used on the entire file system.
Indeed, it seems to me like the du
command itself also traverses the file structure and adds up the file sizes. Here is GNU coreutils du
command's source code.
The shutil.disk_usage
returns the disk usage (i.e. the mount point which backs the path) and not actual file usage under that path. It is equivalent of running df /path/to/mount
and not du /path/to/files
. Notice that for both directories you got the exact same usage.
From the docs: "Return disk usage statistics about the given path as a named tuple with the attributes total, used and free, which are the amount of total, used and free space, in bytes."
Update for anyone stumbling upon this after 2013:
Depending on your Python version and OS, shutil.disk_usage
might support files and directories for the path
variable. Here's the breakdown:
Windows:
Unix:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With