Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing non-ascii characters in a csv file

I am currently inserting data in my django models using csv file. Below is a simple save function that am using:

def save(self):
myfile = file.csv
data = csv.reader(myfile, delimiter=',', quotechar='"')
i=0
for row in data:
    if i == 0:
        i = i + 1
        continue    #skipping the header row        

    b=MyModel()
    b.create_from_csv_row(row) # calls a method to save in models

The function is working perfectly with ascii characters. However, if the csv file has some non-ascii characters then, an error is raised: UnicodeDecodeError 'ascii' codec can't decode byte 0x93 in position 1526: ordinal not in range(128)

My question is: How can i remove non-ascii characters before saving my csv file to avoid this error.

Thanks in advance.

like image 790
Njogu Mbau Avatar asked Dec 16 '22 07:12

Njogu Mbau


2 Answers

If you really want to strip it, try:

import unicodedata

unicodedata.normalize('NFKD', title).encode('ascii','ignore')

* WARNING THIS WILL MODIFY YOUR DATA * It attempts to find a close match - i.e. ć -> c

Perhaps a better answer is to use unicodecsv instead.

----- EDIT ----- Okay, if you don't care that the data is represented at all, try the following:

# If row references a unicode string
b.create_from_csv_row(row.encode('ascii', 'ignore'))

If row is a collection, not a unicode string, you will need to iterate over the collection to the string level to re-serialize it.

like image 198
DivinusVox Avatar answered Jan 04 '23 01:01

DivinusVox


If you want to remove non-ascii characters from your data then iterate through your data and keep only the ascii.

for item in data:
     if ord(item) <= 128: # 1 - 128 is ascii
          [append,write,print,whatever]

If you want to convert unicode characters to ascii, then the response above by DivinusVox is accurate.

like image 44
jabgibson Avatar answered Jan 04 '23 01:01

jabgibson