Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File size limit for read()?

I'm running into a problem while trying to load large files using Python 3.5. Using read() with no arguments sometimes gave an OSError: Invalid argument. I then tried reading only part of the file and it seemed to work fine. I've determined that it starts to fail somewhere around 2.2GB, below is the example code:

>>> sys.version
'3.5.1 (v3.5.1:37a07cee5969, Dec  5 2015, 21:12:44) \n[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]'
>>> x = open('/Users/username/Desktop/large.txt', 'r').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.2*10**9))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

I also noticed that this does not happen in Python 2.7. Here is the same code run in Python 2.7:

>>> sys.version
'2.7.10 (default, Aug 22 2015, 20:33:39) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1)]'
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.2*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read()
>>>

I am using OS X El Capitan 10.11.1.

Is this a bug or should use another method for reading the files?

like image 479
calico_ Avatar asked Dec 24 '16 17:12

calico_


1 Answers

Yes, you have bumped into a bug.

Good news is that someone else has also found it and already created an issue for it in the Python bug tracker, see: Issue24658 - open().write() fails on 2 GB+ data (OS X). This, seems, is platform depended (OS-X only) and is reproducible when using read and/or write. Apparently an issue exists with the way fread.c is implemented in the libc implementation for OS-X see here.

Bad News is that it is still open (and, currently, inactive) so, you'll have to wait until it is resolved. Either way, you can still take a look at the discussion there if you're interested for the specifics.


As a solution, I'm pretty sure you can side-step the issue until it is fixed by reading in chunks and chaining the chunks during processing. Do the same when writing. Unfortunate but, it might do the trick.

like image 135
Dimitris Fasarakis Hilliard Avatar answered Oct 13 '22 08:10

Dimitris Fasarakis Hilliard