Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open a text file with accents in python

Tags:

python

utf-8

I try to open a text file in French with Python 2.7. I used the command

f=open('textfr','r')

but when I use

f.read()

I lose accented characters: I get u"J'\xc3\xa9tais \xc3\xa0 Paris instead of J'étais à Paris, etc..

when in linux terminal, I do

file -i textfr 

I get

charset=utf-8

so I do not understand...

like image 387
Mostafa Avatar asked Dec 21 '14 18:12

Mostafa


People also ask

How do I open a text file with encoding in Python?

Open a Text File To open a file, you can use Python's built-in open() function. Inside the open() function parentheses, you insert the filepath to be opened in quotation marks. You should also insert a character encoding, which we will talk more about below. This function returns what's called a file object.

How do I open a file with encoding?

To open an encoded file that is not part of a projectOn the File menu, point to Open, choose File or File From Web, and then select the file to open.

What is cp1252 encoding in Python?

Windows-1252 or CP-1252 (code pagecode pageIn computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.https://en.wikipedia.org › wiki › Code_pageCode page - Wikipedia 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.


3 Answers

You need to specify the charset.

f = io.open('textfr', 'r', encoding='utf-8')
like image 111
Ignacio Vazquez-Abrams Avatar answered Oct 22 '22 22:10

Ignacio Vazquez-Abrams


By default, files are read/written using the system default text encoding, as can be found in sys.getdefaultencoding() . On most machines, this is set to utf-8 . but some of machines like yours doesn't use utf-8 you can use a proper encoding for your file , or use utf-8 that is a universal encoding :

in python 3 :

with open('somefile.txt', 'rt', encoding='utf-8') as f:
         #do stuff

in python 2 you can use codecs.open():

import codecs
f=codecs.open ('somefile.txt', 'rt', encoding='utf-8').read()
like image 21
Mazdak Avatar answered Oct 23 '22 00:10

Mazdak


use codecs instead of standard open so

import codecs
codecs.open('textfr','r', 'utf-8')  
like image 27
sax Avatar answered Oct 23 '22 00:10

sax