Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download huge Oracle LOB with cx_Oracle on memory constrained system?

I'm developing part of a system where processes are limited to about 350MB of RAM; we use cx_Oracle to download files from an external system for processing.

The external system stores files as BLOBs, and we can grab them doing something like this:

# ... set up Oracle connection, then
cursor.execute(u"""SELECT   filename, data, filesize
                   FROM    FILEDATA
                   WHERE   ID = :id""", id=the_one_you_wanted)
filename, lob, filesize = cursor.fetchone()

with open(filename, "w") as the_file:
    the_file.write(lob.read())

lob.read() will obviously fail with MemoryError when we hit a file larger than 300-350MB, so we've tried something like this instead of reading it all at once:

read_size = 0
chunk_size = lob.getchunksize() * 100
while read_size < filesize:
    data = lob.read(chunk_size, read_size + 1)
    read_size += len(data)
    the_file.write(data)

Unfortunately, we still get MemoryError after several iterations. From the time lob.read() is taking, and the out-of-memory condition we eventually get, it looks as if lob.read() is pulling ( chunk_size + read_size ) bytes from the database every time. That is, reads are taking O(n) time and O(n) memory, even though the buffer is quite a bit smaller.

To work around this, we've tried something like:

read_size = 0
while read_size < filesize:
    q = u'''SELECT dbms_lob.substr(data, 2000, %s)
            FROM FILEDATA WHERE ID = :id''' % (read_bytes + 1)
    cursor.execute(q, id=filedataid[0])
    row = cursor.fetchone()
    read_bytes += len(row[0])
    the_file.write(row[0])

This pulls 2000 bytes (argh) at a time, and takes forever (something like two hours for a 1.5GB file). Why 2000 bytes? According to the Oracle docs, dbms_lob.substr() stores its return value in a RAW, which is limited to 2000 bytes.

Is there some way I can store the dbms_lob.substr() results in a larger data object and read maybe a few megabytes at a time? How do I do this with cx_Oracle?

like image 780
Chris Herborth Avatar asked Oct 01 '12 14:10

Chris Herborth


People also ask

Do you need Oracle client for cx_Oracle?

Using cx_Oracle requires Oracle Client libraries to be installed. These provide the necessary network connectivity allowing cx_Oracle to access an Oracle Database instance. Oracle Client versions 19, 18, 12 and 11.2 are supported.

What is cx_Oracle lob?

BLOB - Binary Large Object, used for storing binary data. cx_Oracle uses the type cx_Oracle. DB_TYPE_BLOB . CLOB - Character Large Object, used for string strings in the database character set format. cx_Oracle uses the type cx_Oracle.

What is Python cx_Oracle?

cx_Oracle is a Python extension module that enables access to Oracle Database. It conforms to the Python database API 2.0 specification with a considerable number of additions and a couple of exclusions. cx_Oracle 8.3 was tested with Python versions 3.6 through 3.10.


1 Answers

I think that the argument order in lob.read() is reversed in your code. The first argument should be the offset, the second argument should be the amount to read. This would explain the O(n) time and memory usage.

like image 138
Curt Avatar answered Sep 18 '22 15:09

Curt