I intend to frequently read/write small pieces of information from many different files. The following somewhat contrived example shows substantially less time taken when using os
operations for acting directly on file descriptors. Am I missing any downside other than the convenience of file objects?
import os
import time
N = 10000
PATH = "/tmp/foo.test"
def testOpen():
for i in range(N):
with open(PATH, "wb") as fh:
fh.write("A")
for i in range(N):
with open(PATH, "rb") as fh:
s = fh.read()
def testOsOpen():
for i in range(N):
fd = os.open(PATH, os.O_CREAT | os.O_WRONLY)
try:
os.write(fd, "A")
finally:
os.close(fd)
for i in range(N):
fd = os.open(PATH, os.O_RDONLY)
try:
s = os.read(fd, 1)
finally:
os.close(fd)
if __name__ == "__main__":
for fn in testOpen, testOsOpen:
start = time.time()
fn()
print fn.func_name, "took", time.time() - start
Sample run:
$ python bench.py
testOpen took 1.82302999496
testOsOpen took 0.436559915543
I'll answer just so this doesn't stay open forever ;-)
There's really little to say: as you already noted, a file
object is more convenient. In some cases it's also more functional; for example, it does its own layer of buffering to speed line-oriented text operations (like file_object.readline()
) (BTW, that's one reason it's slower too.) And a file
object strives to work the same way across all platforms.
But if you don't need/want that, there's nothing at all wrong with using the lower-level & zippier os
file descriptor functions instead. There are many of the latter, and not all are supported on all platforms, and not all options are supported on all platforms. Of course you're responsible for restricting yourself to a subset of operations & options in the intersection of the platforms you care about (which is generally true of all functions in os
, not just its file descriptor functions - the name os
is a strong hint that the stuff it contains may be OS-dependent).
With respect to Pythons 2 and 3, the differences are due to the strong distinction Python 3 makes between "text" and "binary" modes on all platforms. It's a Unicode world, and "text mode" for file
objects make no sense without specifying the intended encoding. In Python 3, a file
object read method returns a str
object (a Unicode string) if the file was opened in "text mode", but a bytes
object if in "binary mode". Similarly for write methods.
Because the os
file descriptor methods have no notion of encoding, they can only work with bytes-like objects in Python 3 (regardless of whether, e.g., on Windows, the file descriptor was opened with the low-level os.open()
O_BINARY
or O_TEXT
flags).
In practice, in the example you gave, this just means you would have to change instances of
"A"
to
b"A"
Note that you can also use the b"..."
literal syntax in a recent-enough version of Python 2, although it's still just a string literal in Python 2. In Python 3 it denotes a different kind of object (bytes
), and file descriptor functions are restricted to writing and returning bytes-like objects.
But if you're working with "binary data", that's no restriction at all. If you're working with "text data", it may be (not enough info about your specifics to guess).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With