I'm running into a problem while trying to load large files using Python 3.5. Using read()
with no arguments sometimes gave an OSError: Invalid argument
. I then tried reading only part of the file and it seemed to work fine. I've determined that it starts to fail somewhere around 2.2GB
, below is the example code:
>>> sys.version
'3.5.1 (v3.5.1:37a07cee5969, Dec 5 2015, 21:12:44) \n[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]'
>>> x = open('/Users/username/Desktop/large.txt', 'r').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.2*10**9))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
I also noticed that this does not happen in Python 2.7. Here is the same code run in Python 2.7:
>>> sys.version
'2.7.10 (default, Aug 22 2015, 20:33:39) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1)]'
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.2*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read()
>>>
I am using OS X El Capitan 10.11.1.
Is this a bug or should use another method for reading the files?
Yes, you have bumped into a bug.
Good news is that someone else has also found it and already created an issue for it in the Python bug tracker, see: Issue24658 - open().write()
fails on 2 GB+ data (OS X). This, seems, is platform depended (OS-X only) and is reproducible when using read
and/or write
. Apparently an issue exists with the way fread.c
is implemented in the libc implementation for OS-X see here.
Bad News is that it is still open (and, currently, inactive) so, you'll have to wait until it is resolved. Either way, you can still take a look at the discussion there if you're interested for the specifics.
As a solution, I'm pretty sure you can side-step the issue until it is fixed by reading in chunks and chaining the chunks during processing. Do the same when writing. Unfortunate but, it might do the trick.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With