Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Charset conversion from XXX to utf-8, command line

I have a bunch of text files that are encoded in ISO-8851-2 (have some polish characters). Is there a command line tool for linux/mac that I could run from a shell script to convert this to a saner utf-8?

like image 874
Marcin Avatar asked Apr 27 '10 15:04

Marcin


People also ask

How do I change my encoding to UTF-8?

UTF-8 Encoding in Notepad (Windows)Click File in the top-left corner of your screen. In the dialog which appears, select the following options: In the "Save as type" drop-down, select All Files. In the "Encoding" drop-down, select UTF-8.

How do I convert a file to UTF-8?

Name your file, and update your file path as needed. Click Tools, then select Web options. Go to the Encoding tab. In the dropdown for Save this document as: choose Unicode (UTF-8).


1 Answers

Use iconv, for example like this:

iconv -f LATIN1 -t UTF-8 input.txt > output.txt

Some more information:

  • You may want to specify UTF-8//TRANSLIT instead of plain UTF-8. To quote the manpage:

    If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

  • For a full list of encoding codes accepted by iconv, execute iconv -l.

  • The example above makes use of shell redirection. Make sure you are not using a shell that mangles encodings on redirection – that is, do not use PowerShell for this.
like image 139
lhf Avatar answered Oct 13 '22 02:10

lhf