Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert CSV to UTF-8 in Python

Tags:

python

csv

utf-8

I am trying to create a duplicate CSV without a header. When I attempt this I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1895: invalid start byte.

I've read the python CSV documentation on Unicode and UTF-8 encoding and have implemented it. However, my output file is being generated with no data in it. Not sure what I am doing wrong here.

import csv

path =  '/Users/johndoe/file.csv'

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:

    def unicode_csv(infile, outfile):
        inputs = csv.reader(utf_8_encoder(infile))
        output = csv.writer(outfile)

        for index, row in enumerate(inputs):
            yield [unicode(cell, 'utf-8') for cell in row]
            if index == 0:
                 continue
        output.writerow(row)

    def utf_8_encoder(infile):
        for line in infile:
            yield line.encode('utf-8')

unicode_csv(infile, outfile)
like image 239
user3062459 Avatar asked Dec 15 '22 12:12

user3062459


1 Answers

The solution was to simply include two additional parameters to the

with open(path, 'r') as infile:

The two parameters are encoding ='UTF-8' and errors='ignore'. This allowed me to create a duplicate of original CSV without the headers and without the UnicodeDecodeError. Below is the completed code.

import csv

path =  '/Users/johndoe/file.csv'

with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile:
     inputs = csv.reader(infile)
     output = csv.writer(outfile)

     for index, row in enumerate(inputs):
         # Create file with no header
         if index == 0:
             continue
         output.writerow(row)
like image 152
user3062459 Avatar answered Jan 04 '23 04:01

user3062459