Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Acceptable use of os.open/read/write/close?

Tags:

python

file

I intend to frequently read/write small pieces of information from many different files. The following somewhat contrived example shows substantially less time taken when using os operations for acting directly on file descriptors. Am I missing any downside other than the convenience of file objects?

import os
import time

N = 10000
PATH = "/tmp/foo.test"

def testOpen():
    for i in range(N):
        with open(PATH, "wb") as fh:
            fh.write("A")

    for i in range(N):
        with open(PATH, "rb") as fh:
            s = fh.read()

def testOsOpen():
    for i in range(N):
        fd = os.open(PATH, os.O_CREAT | os.O_WRONLY)
        try:
            os.write(fd, "A")
        finally:
            os.close(fd)

    for i in range(N):
        fd = os.open(PATH, os.O_RDONLY)
        try:
            s = os.read(fd, 1)
        finally:
            os.close(fd)

if __name__ == "__main__":
    for fn in testOpen, testOsOpen:
        start = time.time()
        fn()
        print fn.func_name, "took", time.time() - start

Sample run:

$ python bench.py 
testOpen took 1.82302999496
testOsOpen took 0.436559915543
like image 632
Ben Avatar asked Nov 09 '22 11:11

Ben


1 Answers

I'll answer just so this doesn't stay open forever ;-)

There's really little to say: as you already noted, a file object is more convenient. In some cases it's also more functional; for example, it does its own layer of buffering to speed line-oriented text operations (like file_object.readline()) (BTW, that's one reason it's slower too.) And a file object strives to work the same way across all platforms.

But if you don't need/want that, there's nothing at all wrong with using the lower-level & zippier os file descriptor functions instead. There are many of the latter, and not all are supported on all platforms, and not all options are supported on all platforms. Of course you're responsible for restricting yourself to a subset of operations & options in the intersection of the platforms you care about (which is generally true of all functions in os, not just its file descriptor functions - the name os is a strong hint that the stuff it contains may be OS-dependent).

With respect to Pythons 2 and 3, the differences are due to the strong distinction Python 3 makes between "text" and "binary" modes on all platforms. It's a Unicode world, and "text mode" for file objects make no sense without specifying the intended encoding. In Python 3, a file object read method returns a str object (a Unicode string) if the file was opened in "text mode", but a bytes object if in "binary mode". Similarly for write methods.

Because the os file descriptor methods have no notion of encoding, they can only work with bytes-like objects in Python 3 (regardless of whether, e.g., on Windows, the file descriptor was opened with the low-level os.open() O_BINARY or O_TEXT flags).

In practice, in the example you gave, this just means you would have to change instances of

"A"

to

b"A"

Note that you can also use the b"..." literal syntax in a recent-enough version of Python 2, although it's still just a string literal in Python 2. In Python 3 it denotes a different kind of object (bytes), and file descriptor functions are restricted to writing and returning bytes-like objects.

But if you're working with "binary data", that's no restriction at all. If you're working with "text data", it may be (not enough info about your specifics to guess).

like image 177
Tim Peters Avatar answered Nov 15 '22 13:11

Tim Peters