I need to convert a bunch of files to utf-8 in Python, and I have trouble with the "converting the file" part.
I'd like to do the equivalent of:
iconv -t utf-8 $file > converted/$file # this is shell code
Thanks!
Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.
setdefaultencoding() is purposely removed from sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).
You can use the codecs module, like this:
import codecs BLOCKSIZE = 1048576 # or some other, desired size in bytes with codecs.open(sourceFileName, "r", "your-source-encoding") as sourceFile: with codecs.open(targetFileName, "w", "utf-8") as targetFile: while True: contents = sourceFile.read(BLOCKSIZE) if not contents: break targetFile.write(contents)
EDIT: added BLOCKSIZE
parameter to control file chunk size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With