I'm developing part of a system where processes are limited to about 350MB of RAM; we use cx_Oracle to download files from an external system for processing.
The external system stores files as BLOBs, and we can grab them doing something like this:
# ... set up Oracle connection, then
cursor.execute(u"""SELECT filename, data, filesize
FROM FILEDATA
WHERE ID = :id""", id=the_one_you_wanted)
filename, lob, filesize = cursor.fetchone()
with open(filename, "w") as the_file:
the_file.write(lob.read())
lob.read()
will obviously fail with MemoryError
when we hit a file larger than 300-350MB, so we've tried something like this instead of reading it all at once:
read_size = 0
chunk_size = lob.getchunksize() * 100
while read_size < filesize:
data = lob.read(chunk_size, read_size + 1)
read_size += len(data)
the_file.write(data)
Unfortunately, we still get MemoryError
after several iterations. From the time lob.read()
is taking, and the out-of-memory condition we eventually get, it looks as if lob.read()
is pulling ( chunk_size + read_size ) bytes from the database every time. That is, reads are taking O(n) time and O(n) memory, even though the buffer is quite a bit smaller.
To work around this, we've tried something like:
read_size = 0
while read_size < filesize:
q = u'''SELECT dbms_lob.substr(data, 2000, %s)
FROM FILEDATA WHERE ID = :id''' % (read_bytes + 1)
cursor.execute(q, id=filedataid[0])
row = cursor.fetchone()
read_bytes += len(row[0])
the_file.write(row[0])
This pulls 2000 bytes (argh) at a time, and takes forever (something like two hours for a 1.5GB file). Why 2000 bytes? According to the Oracle docs, dbms_lob.substr()
stores its return value in a RAW, which is limited to 2000 bytes.
Is there some way I can store the dbms_lob.substr()
results in a larger data object and read maybe a few megabytes at a time? How do I do this with cx_Oracle?
Using cx_Oracle requires Oracle Client libraries to be installed. These provide the necessary network connectivity allowing cx_Oracle to access an Oracle Database instance. Oracle Client versions 19, 18, 12 and 11.2 are supported.
BLOB - Binary Large Object, used for storing binary data. cx_Oracle uses the type cx_Oracle. DB_TYPE_BLOB . CLOB - Character Large Object, used for string strings in the database character set format. cx_Oracle uses the type cx_Oracle.
cx_Oracle is a Python extension module that enables access to Oracle Database. It conforms to the Python database API 2.0 specification with a considerable number of additions and a couple of exclusions. cx_Oracle 8.3 was tested with Python versions 3.6 through 3.10.
I think that the argument order in lob.read() is reversed in your code. The first argument should be the offset, the second argument should be the amount to read. This would explain the O(n) time and memory usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With