I have around 600,000 files encoded in ANSI
and I want to convert them to UTF-8
. I can do that individually in NOTEPAD++
, but i can't do that for 600,000 files.Can i do this in R
or Python
?
I have found this link but the Python
script is not running:
notepad++ converting ansi encoded file to utf-8
Why don't you read the file and write it as UTF-8? You can do that in Python.
#to support encodings
import codecs
#read input file
with codecs.open(path, 'r', encoding = 'utf8') as file:
lines = file.read()
#write output file
with codecs.open(path, 'w', encoding = 'utf8') as file:
file.write(lines)
I appreciate that this is an old question but having just resolved a similar problem recently I thought I would share my solution.
I had a file being prepared by one program that I needed to import in to an sqlite3 database but the text file was always 'ANSI' and sqlite3 requires UTF-8.
The ANSI encoding is recognised as 'mbcs' in python and therefore the code I have used, ripping off something else I found is:
blockSize = 1048576
with codecs.open("your ANSI source file.txt","r",encoding="mbcs") as sourceFile:
with codecs.open("Your UTF-8 output file.txt","w",encoding="UTF-8") as targetFile:
while True:
contents = sourceFile.read(blockSize)
if not contents:
break
targetFile.write(contents)
The below link contains some information on the encoding types that I found on my research
https://docs.python.org/2.4/lib/standard-encodings.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With