Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handing conversion from bytes to string when not explicitly opening a file in Python 3

I am using the Requests module to authorise and then pull csv content from a web API and have it running fine in Python 2.7. I now want to write the same script in Python 3.5 but experiencing some issues:

"iterator should return strings, not bytes (did you open the file in text mode?)"

The requests.get seems to return bytes and not a string, which seems to be related to the encoding issues seen when moving to Python 3.x. The error is raised on the 3rd from last line: next(reader). In Python 2.7 this was not an issue because the csv functions were handled in 'wb' mode.

This article is very similar, but as I'm not opening a csv file directly, I cant seem to force the response text to be encoded this way: csv.Error: iterator should return strings, not bytes

countries = ['UK','US','CA']
datelist = [1,2,3,4]
baseurl = 'https://somewebsite.com/exporttoCSV.php'

#--- For all date/cc combinations
for cc in countries:
    for d in datelist:

        #---Build API String with variables
        url = (baseurl + '?data=chart&output=csv' +
               '&dataset=' + d + 
               '&cc=' + cc)

        #---Run API Call and create reader object
        r = requests.get(url, auth=(username, password))
        text = r.iter_lines()
        reader = csv.reader(text,delimiter=',')

        #---Write csv output to csv file with territory and date columns
        with open(cc + '_'+ d +'.csv','wt', newline='') as file:
            a = csv.writer(file)
            a.writerow(['position','id','title','kind','peers','territory','date']) #---Write header line
            next(reader) #---Skip original headers
            for i in reader:
                a.writerow(i +[countrydict[cc]] + [datevalue])
like image 727
Steve Avatar asked Jul 08 '16 11:07

Steve


People also ask

Which method converts raw byte data to string in Python?

String encode() and decode() method provides symmetry whereas bytes() constructor is more object-oriented and readable approach. You can choose any of them based on your preference.

How do I convert bytes to text files?

First, open a file in binary write mode and then specify the contents to write in the form of bytes. Next, use the write function to write the byte contents to a binary file.

How do you decode bytes in Python?

Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.


2 Answers

Without being able to test your exact scenario, I believe this should be solved by changing text = r.iter_lines() to:

text = (line.decode('utf-8') for line in r.iter_lines())

This should decode each line read in by r.iter_lines() from a byte string to a string usable by csv.reader

My test case is as follows:

>>> iter_lines = [b'1,2,3,4',b'2,3,4,5',b'3,4,5,6']
>>> text = (line.decode('utf-8') for line in iter_lines)
>>> reader = csv.reader(text, delimiter=',')
>>> next(reader)
['1', '2', '3', '4']
>>> for i in reader:
...     print(i)
...
['2', '3', '4', '5']
['3', '4', '5', '6']
like image 113
Bamcclur Avatar answered Oct 07 '22 17:10

Bamcclur


Some files have to be read in as bytes, for example from Django SimpleUploadedFile, which is a testing class only uses bytes. Here is some example code from my test suite on how I got it working:

test_code.py

import os
from django.core.files.uploadedfile import SimpleUploadedFile
from django.test import TestCase

class ImportDataViewTests(TestCase):

    def setUp(self):
        self.path = "test_in/example.csv"
        self.filename = os.path.split(self.file)[1]

    def test_file_upload(self):
        with open(self.path, 'rb') as infile:
            _file = SimpleUploadedFile(self.filename, infile.read())

        # now an `InMemoryUploadedFile` exists, so test it as you shall!

prod_code.py

import csv

def import_records(self, infile):
    csvfile = (line.decode('utf8') for line in infile)
    reader = csv.DictReader(csvfile)

    for row in reader:
        # loop through file and do stuff!
like image 30
Aaron Lelevier Avatar answered Oct 07 '22 16:10

Aaron Lelevier