Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'utf-8' codec can't decode byte reading a file in Python3.4 but not in Python2.7

I was trying to read a file in python2.7, and it was readen perfectly. The problem that I have is when I execute the same program in Python3.4 and then appear the error:

'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'

Also, when I run the program in Windows (with python3.4), the error doesn't appear. The first line of the document is: Codi;Codi_lloc_anonim;Nom

and the code of my program is:

def lectdict(filename,colkey,colvalue):
    f = open(filename,'r')
    D = dict()

    for line in f:
       if line == '\n': continue
       D[line.split(';')[colkey]] = D.get(line.split(';')[colkey],[]) + [line.split(';')[colvalue]]

f.close
return D

Traduccio = lectdict('Noms_departaments_centres.txt',1,2)
like image 581
oscarcapote Avatar asked Mar 05 '15 11:03

oscarcapote


2 Answers

In my case I can't change encoding because my file is really UTF-8 encoded. But some rows are corrupted and causes the same error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 7092: invalid continuation byte

My decision is to open file in binary mode:

open(filename, 'rb')
like image 131
dyomas Avatar answered Oct 26 '22 22:10

dyomas


Ok, I did the same as @unutbu tell me. The result was a lot of encodings one of these are cp1250, for that reason I change :

f = open(filename,'r')

to

f = open(filename,'r', encoding='cp1250')

like @triplee suggest me. And now I can read my files.

like image 27
oscarcapote Avatar answered Oct 27 '22 00:10

oscarcapote