I am trying to create a duplicate CSV without a header. When I attempt this I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1895: invalid start byte.
I've read the python CSV
documentation on Unicode
and UTF-8
encoding and have implemented it.
However, my output file is being generated with no data in it. Not sure what I am doing wrong here.
import csv
path = '/Users/johndoe/file.csv'
with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:
def unicode_csv(infile, outfile):
inputs = csv.reader(utf_8_encoder(infile))
output = csv.writer(outfile)
for index, row in enumerate(inputs):
yield [unicode(cell, 'utf-8') for cell in row]
if index == 0:
continue
output.writerow(row)
def utf_8_encoder(infile):
for line in infile:
yield line.encode('utf-8')
unicode_csv(infile, outfile)
The solution was to simply include two additional parameters to the
with open(path, 'r') as infile:
The two parameters are encoding ='UTF-8' and errors='ignore'. This allowed me to create a duplicate of original CSV without the headers and without the UnicodeDecodeError. Below is the completed code.
import csv
path = '/Users/johndoe/file.csv'
with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile:
inputs = csv.reader(infile)
output = csv.writer(outfile)
for index, row in enumerate(inputs):
# Create file with no header
if index == 0:
continue
output.writerow(row)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With