Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a file to utf-8 in Python?

I need to convert a bunch of files to utf-8 in Python, and I have trouble with the "converting the file" part.

I'd like to do the equivalent of:

iconv -t utf-8 $file > converted/$file # this is shell code 

Thanks!

like image 282
Sébastien RoccaSerra Avatar asked Oct 10 '08 13:10

Sébastien RoccaSerra


People also ask

How do I change a file to UTF-8?

Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.

How do I change encoding in Python?

setdefaultencoding() is purposely removed from sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).


1 Answers

You can use the codecs module, like this:

import codecs BLOCKSIZE = 1048576 # or some other, desired size in bytes with codecs.open(sourceFileName, "r", "your-source-encoding") as sourceFile:     with codecs.open(targetFileName, "w", "utf-8") as targetFile:         while True:             contents = sourceFile.read(BLOCKSIZE)             if not contents:                 break             targetFile.write(contents) 

EDIT: added BLOCKSIZE parameter to control file chunk size.

like image 50
DzinX Avatar answered Oct 16 '22 02:10

DzinX