I'm looking through the source of StringIO
where it says says some notes:
cStringIO
, but
it's not subclassable.StringIO
just like a memory file object,
why is it slower than real file object?
This is not actually about Python's interpreted nature: BytesIO
is implemented in Python*, same as StringIO
, but still beats file I/O.
In fact, StringIO
is faster than file I/O under StringIO
's ideal use case (a single write to the beginning of an empty buffer). Actually, if the write is big enough it'll even beat cStringIO
. See my question here.
So why is StringIO
considered "slow"? StringIO
's real problem is being backed by immutable sequences, whether str
or unicode
. This is fine if you only write once, obviously. But, as pointed out by tdelaney's answer to my question, it slows down a ton (like, 10-100x) when writing to random locations, since every time it gets a write in the middle it has to copy the entire backing sequence.
BytesIO
doesn't have this problem since it's backed by a (mutable) bytearray
instead. Likewise, whatever cStringIO
does, it seems to handle random writes much more easily. I'd guess that it breaks the immutability rule internally, since C strings are mutable.
* Well, the version in _pyio
is, anyway. The standard library version in io
is written in C.
Python's file handling is implemented entirely in C. This means that it's quite fast (at least in the same order of magnitude as native C code).
The StringIO library, however, is written in Python. The module itself is thus interpreted, with the associated performance penalties.
As you know, there is another module, cStringIO, with a similar interface, which you can use in performance-sensitive code. The reason this isn't subclassable is because it's written in C.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With