I am using the Requests module to authorise and then pull csv content from a web API and have it running fine in Python 2.7. I now want to write the same script in Python 3.5 but experiencing some issues:
"iterator should return strings, not bytes (did you open the file in text mode?)"
The requests.get
seems to return bytes and not a string, which seems to be related to the encoding issues seen when moving to Python 3.x. The error is raised on the 3rd from last line: next(reader)
. In Python 2.7 this was not an issue because the csv functions were handled in 'wb'
mode.
This article is very similar, but as I'm not opening a csv file directly, I cant seem to force the response text to be encoded this way: csv.Error: iterator should return strings, not bytes
countries = ['UK','US','CA']
datelist = [1,2,3,4]
baseurl = 'https://somewebsite.com/exporttoCSV.php'
#--- For all date/cc combinations
for cc in countries:
for d in datelist:
#---Build API String with variables
url = (baseurl + '?data=chart&output=csv' +
'&dataset=' + d +
'&cc=' + cc)
#---Run API Call and create reader object
r = requests.get(url, auth=(username, password))
text = r.iter_lines()
reader = csv.reader(text,delimiter=',')
#---Write csv output to csv file with territory and date columns
with open(cc + '_'+ d +'.csv','wt', newline='') as file:
a = csv.writer(file)
a.writerow(['position','id','title','kind','peers','territory','date']) #---Write header line
next(reader) #---Skip original headers
for i in reader:
a.writerow(i +[countrydict[cc]] + [datevalue])
String encode() and decode() method provides symmetry whereas bytes() constructor is more object-oriented and readable approach. You can choose any of them based on your preference.
First, open a file in binary write mode and then specify the contents to write in the form of bytes. Next, use the write function to write the byte contents to a binary file.
Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.
Without being able to test your exact scenario, I believe this should be solved by changing text = r.iter_lines()
to:
text = (line.decode('utf-8') for line in r.iter_lines())
This should decode each line read in by r.iter_lines() from a byte string to a string usable by csv.reader
My test case is as follows:
>>> iter_lines = [b'1,2,3,4',b'2,3,4,5',b'3,4,5,6']
>>> text = (line.decode('utf-8') for line in iter_lines)
>>> reader = csv.reader(text, delimiter=',')
>>> next(reader)
['1', '2', '3', '4']
>>> for i in reader:
... print(i)
...
['2', '3', '4', '5']
['3', '4', '5', '6']
Some files have to be read in as bytes, for example from Django SimpleUploadedFile
, which is a testing class only uses bytes. Here is some example code from my test suite on how I got it working:
test_code.py
import os
from django.core.files.uploadedfile import SimpleUploadedFile
from django.test import TestCase
class ImportDataViewTests(TestCase):
def setUp(self):
self.path = "test_in/example.csv"
self.filename = os.path.split(self.file)[1]
def test_file_upload(self):
with open(self.path, 'rb') as infile:
_file = SimpleUploadedFile(self.filename, infile.read())
# now an `InMemoryUploadedFile` exists, so test it as you shall!
prod_code.py
import csv
def import_records(self, infile):
csvfile = (line.decode('utf8') for line in infile)
reader = csv.DictReader(csvfile)
for row in reader:
# loop through file and do stuff!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With