I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command:
iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt
No luck, getting some weird results:
ÊÀÇÀÃÃœ ÃÎÂÛÉ ÂÅÊ
I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/
What is going wrong?
Windows-1252 is a subset of UTF-8 in terms of 'what characters are available', but not in terms of their byte-by-byte representation. Windows-1252 has characters between bytes 127 and 255 that UTF-8 has a different encoding for.
Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.
My solution:
iconv -f windows-1252 -t utf-8 in.file -o out.file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With