How do I make the response from Python's requests package be a "file-like object"

Tags:

I am hitting a webservice with Python's requests library and the endpoint is returning a (very large) CSV file which I then want to stream into a database. The code looks like this:

response = requests.get(url, auth=auth, stream=True)
if response.status_code == 200:
    stream_csv_into_database(response)

Now when the database is a MongoDB database, the loading works perfectly using a DictReader:

def stream_csv_into_database(response):
    .
    .
    .
    for record in csv.DictReader(response.iter_lines(), delimiter='\t'):
        product_count += 1
        product = {k:v for (k,v) in record.iteritems() if v}
        product['_id'] = product_count
        collection.insert(product)

However, I am switching from MongoDB to Amazon RedShift, which I can already access just fine using psycopg2. I can open connections and make simple queries just fine, but what I want to do is use my streamed response from the webservice and use psycopg2's copy_expert to load the RedShift table. Here is what I tried so far:

def stream_csv_into_database(response, campaign, config):
    print 'Loading product feed for {0}'.format(campaign)
    conn = new_redshift_connection(config) # My own helper, works fine.
    table = 'products.' + campaign
    cur = conn.cursor()
    reader = response.iter_lines()
    # Error on following line:
    cur.copy_expert("COPY {0} FROM STDIN WITH CSV HEADER DELIMITER '\t'".format(table), reader)
    conn.commit()
    cur.close()
    conn.close()

The error that I get is:

file must be a readable file-like object for COPY FROM; a writable file-like object for COPY TO.

I understand what the error is saying; in fact, I can see from the psycopg2 documentation that copy_expert calls copy_from, which:

Reads data from a file-like object appending them to a database table (COPY table FROM file syntax). The source file must have both read() and readline() method.

My problem is that I cannot find a way to make the response object be a file-like object! I tried both .data and .iter_lines without success. I certainly do not want to download the entire multi-gigabyte file from the webservice and then upload it to RedShift. There must be a way to use the streaming response as a file-like object that psycopg2 can copy into RedShift. Anyone know what I am missing?

761

asked Jul 17 '14 08:07

Ray Toal

1 Answers

You could use the response.raw file object, but take into account that any content encoding (such as GZIP or Deflate compression) will still be in place unless you set the decode_content flag to True when calling .read(), which psycopg2 will not.

You can set the flag on the raw file object to change the default to decompressing-while-reading:

response.raw.decode_content = True

and then use the response.raw file object to csv.DictReader().

160

answered Oct 11 '22 03:10

Martijn Pieters

Related questions
                            
                                Gtk* backend requires pygtk to be installed
                            
                                Haskell equivalent of this Python code
                            
                                Multiple concurrent database transactions with Django?
                            
                                Pandas: How to create subindex efficiently?
                            
                                Gensim get topic for a document (seen document)
                            
                                Is defining an inner function inside a recursive function a bad idea?
                            
                                'module' object has no attribute '_version_' [closed]
                            
                                pip -U -r requirements.txt with a URL keeps reinstalling
                            
                                Get current user in view
                            
                                This field is required. Error with ImageField and FileField on Django
                            
                                Finding periodicity in an algorithmic signal
                            
                                OPENCV: Calibratecamera 2 reprojection error and custom computed one not agree
                            
                                How can I send multipart/form-data that contains json object and image file via curl? And how can I treat it with Django Rest Framework?
                            
                                Django model, using auth_group as a ForeignKey
                            
                                decode CID font codes to equivalent ASCII characters
                            
                                Python HTTPS Proxy Tunnelling
                            
                                Using the Django scheduler app with your own models
                            
                                XsendFile with apache and django
                            
                                Multiple titles in legend in matplotlib
                            
                                Python abc module: Extending both an abstract base class and an exception-derived class leads to surprising behavior

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I make the response from Python's requests package be a "file-like object"

Tags:

python

psycopg2

python-requests

amazon-redshift

Ray Toal

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us