I'm trying to write a custom extraction method for babel, to extract strings from a specific column in a csv file. I followed the documentation here.
Here is my extraction method code:
def extract_csv(fileobj, keywords, comment_tags, options):
import csv
reader = csv.DictReader(fileobj, delimiter=',')
for row in reader:
if row and row['caption'] != '':
yield (reader.line_num, '', row['caption'], '')
When i try to run the extraction i get this error:
File "/Users/tiagosilva/repos/naltio/csv_extractor.py", line 18, in extract_csv for row in reader: File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py", line 111, in next self.fieldnames File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py", line 98, in fieldnames self._fieldnames = next(self.reader) _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
It seems the fileobj that is passed to the function was opened in binary mode.
How to make this work? I can think of 2 possible solutions, but I don't know how to code them:
1) is there a way to use it with DictReader?
2) Is there a way to signal babel to open the file in text mode?
I'm open to other non listed solutions.
Python CSV DictReaderThe csv. DictReader class operates like a regular reader but maps the information read into a dictionary. The keys for the dictionary can be passed in with the fieldnames parameter or inferred from the first row of the CSV file. The first line of the file consists of dictionary keys.
A cvs. DictReader returns an iterator that produces each row as needed. To get all of the rows into a list, an iterator can be wrapped with list() to creat a list . In this case, all the data goes into the list rows .
DictReader() The objects of a csv. DictReader() class can be used to read a CSV file as a dictionary.
I actually found a way to do it!
It's solution 1, a way to handle a binary file. The solution is to wrap a TextIOWrapper around the binary file and decode it and pass that to the DictReader.
import csv
import io
with io.TextIOWrapper(fileobj, encoding='utf-8') as text_file:
reader = csv.DictReader(text_file, delimiter=',')
for row in reader:
if row and 'caption' in row.keys():
yield (reader.line_num, '', row['caption'], '')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With