Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make the response from Python's requests package be a "file-like object"

I am hitting a webservice with Python's requests library and the endpoint is returning a (very large) CSV file which I then want to stream into a database. The code looks like this:

response = requests.get(url, auth=auth, stream=True)
if response.status_code == 200:
    stream_csv_into_database(response)

Now when the database is a MongoDB database, the loading works perfectly using a DictReader:

def stream_csv_into_database(response):
    .
    .
    .
    for record in csv.DictReader(response.iter_lines(), delimiter='\t'):
        product_count += 1
        product = {k:v for (k,v) in record.iteritems() if v}
        product['_id'] = product_count
        collection.insert(product)

However, I am switching from MongoDB to Amazon RedShift, which I can already access just fine using psycopg2. I can open connections and make simple queries just fine, but what I want to do is use my streamed response from the webservice and use psycopg2's copy_expert to load the RedShift table. Here is what I tried so far:

def stream_csv_into_database(response, campaign, config):
    print 'Loading product feed for {0}'.format(campaign)
    conn = new_redshift_connection(config) # My own helper, works fine.
    table = 'products.' + campaign
    cur = conn.cursor()
    reader = response.iter_lines()
    # Error on following line:
    cur.copy_expert("COPY {0} FROM STDIN WITH CSV HEADER DELIMITER '\t'".format(table), reader)
    conn.commit()
    cur.close()
    conn.close()

The error that I get is:

file must be a readable file-like object for COPY FROM; a writable file-like object for COPY TO.

I understand what the error is saying; in fact, I can see from the psycopg2 documentation that copy_expert calls copy_from, which:

Reads data from a file-like object appending them to a database table (COPY table FROM file syntax). The source file must have both read() and readline() method.

My problem is that I cannot find a way to make the response object be a file-like object! I tried both .data and .iter_lines without success. I certainly do not want to download the entire multi-gigabyte file from the webservice and then upload it to RedShift. There must be a way to use the streaming response as a file-like object that psycopg2 can copy into RedShift. Anyone know what I am missing?

like image 761
Ray Toal Avatar asked Jul 17 '14 08:07

Ray Toal


People also ask

How do you write a response to a file in Python?

Writing response to file When writing responses to file you need to use the open function with the appropriate file write mode. For text responses you need to use "w" - plain write mode. For binary responses you need to use "wb" - binary write mode.

How do you get a response from a Python request?

When one makes a request to a URI, it returns a response. This Response object in terms of python is returned by requests. method(), method being – get, post, put, etc.

What is response content in Python?

Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc.


1 Answers

You could use the response.raw file object, but take into account that any content encoding (such as GZIP or Deflate compression) will still be in place unless you set the decode_content flag to True when calling .read(), which psycopg2 will not.

You can set the flag on the raw file object to change the default to decompressing-while-reading:

response.raw.decode_content = True

and then use the response.raw file object to csv.DictReader().

like image 160
Martijn Pieters Avatar answered Oct 11 '22 03:10

Martijn Pieters