Python 3 CSV file giving UnicodeDecodeError: 'utf-8' codec can't decode byte error when I print

Tags:

I have the following code in Python 3, which is meant to print out each line in a csv file.

import csv with open('my_file.csv', 'r', newline='') as csvfile:     lines = csv.reader(csvfile, delimiter = ',', quotechar = '|')     for line in lines:         print(' '.join(line))

But when I run it, it gives me this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte

I looked through the csv file, and it turns out that if I take out a single ñ (little n with a tilde on top), every line prints out fine.

My problem is that I've looked through a bunch of different solutions to similar problems, but I still have no idea how to fix this, what to decode/encode, etc. Simply taking out the ñ character in the data is NOT an option.

753

asked Feb 01 '14 22:02

HLH

2 Answers

We know the file contains the byte b'\x96' since it is mentioned in the error message:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte

Now we can write a little script to find out if there are any encodings where b'\x96' decodes to ñ:

import pkgutil import encodings import os  def all_encodings():     modnames = set([modname for importer, modname, ispkg in pkgutil.walk_packages(         path=[os.path.dirname(encodings.__file__)], prefix='')])     aliases = set(encodings.aliases.aliases.values())     return modnames.union(aliases)  text = b'\x96' for enc in all_encodings():     try:         msg = text.decode(enc)     except Exception:         continue     if msg == 'ñ':         print('Decoding {t} with {enc} is {m}'.format(t=text, enc=enc, m=msg))

which yields

Decoding b'\x96' with mac_roman is ñ Decoding b'\x96' with mac_farsi is ñ Decoding b'\x96' with mac_croatian is ñ Decoding b'\x96' with mac_arabic is ñ Decoding b'\x96' with mac_romanian is ñ Decoding b'\x96' with mac_iceland is ñ Decoding b'\x96' with mac_turkish is ñ

Therefore, try changing

with open('my_file.csv', 'r', newline='') as csvfile:

to one of those encodings, such as:

with open('my_file.csv', 'r', encoding='mac_roman', newline='') as csvfile:

answered Sep 22 '22 02:09

unutbu

with open('my_file.csv', 'r', newline='', encoding='ISO-8859-1') as csvfile:

ñ character is not listed on UTC-8 encoding. To fix the issue, you may use ISO-8859-1 encoding instead. For more details about this encoding, you may refer to the link below: https://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html

answered Sep 22 '22 02:09

Sir Markpo

Related questions
                            
                                Find the root of the git repository where the file lives
                            
                                python requests module and connection reuse
                            
                                Count and Sort with Pandas
                            
                                install cx_oracle for python
                            
                                The pythonic way to generate pairs
                            
                                matplotlib label doesn't work
                            
                                Fastest Way to Drop Duplicated Index in a Pandas DataFrame [duplicate]
                            
                                Why should exec() and eval() be avoided?
                            
                                Concatenating Tuple
                            
                                "Models aren't loaded yet" error while populating in Django 1.8 or later
                            
                                Way to change Google Chrome user agent in Selenium?
                            
                                linear programming in python?
                            
                                How do I create an incrementing filename in Python?
                            
                                Django internationalization language codes [closed]
                            
                                No Module Named ServerSocket
                            
                                Python initialize multiple variables to the same initial value
                            
                                How to loop through 2D numpy array using x and y coordinates without getting out of bounds error?
                            
                                Django REST framework: method PUT not allowed in ViewSet with def update()
                            
                                How can I get the list of only folders in amazon S3 using python boto?
                            
                                Unzipping and the * operator

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python 3 CSV file giving UnicodeDecodeError: 'utf-8' codec can't decode byte error when I print

Tags:

python

python-3.x

csv

encoding

utf-8

HLH

People also ask

2 Answers

unutbu

Sir Markpo

Recent Activity

Donate For Us