Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

csv.reader read from Requests stream: iterator should return strings, not bytes

I'm trying to stream response to csv.reader using requests.get(url, stream=True) to handle quite big data feeds. My code worked fine with python2.7. Here's code:

response = requests.get(url, stream=True)
ret = csv.reader(response.iter_lines(decode_unicode=True), delimiter=delimiter, quotechar=quotechar,
    dialect=csv.excel_tab)
for line in ret:
    line.get('name')

Unfortunately after migration to python3.6 I got an following error:

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

I was trying to find some wrapper/decorator that would covert result of response.iter_lines() iterator from bytes to string, but no luck with that. I already tried to use io package and also codecs. Using codecs.iterdecode doesn't split data in lines, it's just split probably by chunk_size, and in this case csv.reader is complaining in following way:

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
like image 612
mdargacz Avatar asked Sep 21 '16 14:09

mdargacz


1 Answers

I'm guessing you could wrap this in a genexp and feed decoded lines to it:

from contextlib import closing

with closing(requests.get(url, stream=True)) as r:
    f = (line.decode('utf-8') for line in r.iter_lines())
    reader = csv.reader(f, delimiter=',', quotechar='"')
    for row in reader:
        print(row)

Using some sample data in 3.5 this shuts up csv.reader, every line fed to it is first decoded in the genexp. Also, I'm using closing from contextlib as is generally suggested to automatically close the responce.

like image 55
Dimitris Fasarakis Hilliard Avatar answered Oct 06 '22 18:10

Dimitris Fasarakis Hilliard