Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iconv: Convert from CP1252 to UTF-8

Tags:

iconv

I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command:

iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt

No luck, getting some weird results:

ÊÀÇÀÃÃœ ÃÎÂÛÉ ÂÅÊ

I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/

What is going wrong?

like image 431
Somebody Avatar asked Mar 15 '13 00:03

Somebody


People also ask

Is CP1252 a subset of UTF-8?

Windows-1252 is a subset of UTF-8 in terms of 'what characters are available', but not in terms of their byte-by-byte representation. Windows-1252 has characters between bytes 127 and 255 that UTF-8 has a different encoding for.

How do you convert to UTF?

Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8). Click Ok.


1 Answers

My solution:

iconv -f windows-1252 -t utf-8 in.file -o out.file
like image 117
Java Dude Avatar answered Sep 20 '22 16:09

Java Dude