Converting some code to using asyncio, I'd like to give back control to the asyncio.BaseEventLoop as quickly as possible. This means to avoid blocking waits.
Without asyncio I'd use os.stat() or pathlib.Path.stat() to obtain e.g. the filesize. Is there a way to do this efficiently with asyncio?
Can I just wrap the stat() call so it is a future similar to what's described here?
os.stat() translates to a stat syscall:
$ strace python3 -c 'import os; os.stat("/")'
[...]
stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[...]
which is blocking, and there's no way to get a non-blocking stat syscall.
asyncio provides non-blocking I/O by using non-blocking system calls, which already exists (see man fcntl, with its O_NONBLOCK flag, or ioctl), so asyncio is not making syscalls asynchronous, it exposes already asynchronous syscalls in a nice way.
It's still possible to use the nice ThreadPoolExecutor abstraction to make your blocking stat calls in parallel using a pool of threads.
But you may first consider some other parameters:
strace -T, stat is fast: stat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000007>, probably faster than starting and synchronizing threads.stat is probably in much cases IO bound, so using more CPUs won't helpBut there's also a lot of possibilities for your stats to be faster using a thread pool, like if you're hitting a distributed file system.
You may also take a look at functools.lru_cache: if you're doing multiple stat on the same file or directory, and you're sure it has not changed, caching the result avoids a syscall.
To conclude, "keep it simple", "os.stat" is the efficient way to get a filesize.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With