Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iconv unicode unknown input format

I have a file which is described under Unix as:

$file xxx.csv 
xxx.csv: UTF-8 Unicode text, with very long lines

Viewing it in less/vi will render some special chars (ßÄ°...) unreadable (├╝); Windows will also not display it; importing it directly into a db will just change the special characters to some other special characters (+ä, +ñ, ...).

I wanted to convert it now to a "default readable" encoding with iconv. When I try to convert it with iconv

$iconv -f UTF-8 -t ISO-8859-1 xxx.csv > yyy.csv
iconv: illegal input sequence at position 1234

using UNICODE as input and UTF-8 as output will return the same message

I am guessing the file is somewhat encoded in another format which I do not know - how can I find out which format in order to convert it to something "universally" readable ...

like image 229
RRZ Europe Avatar asked Oct 07 '11 14:10

RRZ Europe


1 Answers

Converting from UTF-8 to ISO-8859-1 only works if your UTF-8 text only has characters that can be represented in ISO-8859-1. If this is not the case, you should specify what needs to happen to these characters, either ignoring (//IGNORE) or approximating (//TRANSLIT) them. Try one of these two:

iconv -f UTF-8 -t ISO-8859-1//IGNORE --output=outfile.csv inputfile.csv
iconv -f UTF-8 -t ISO-8859-1//TRANSLIT --output=outfile.csv inputfile.csv

In most cases, I guess approximation is the best solution, mapping e.g. accented characters to their unaccented counterparts, the euro sign to EUR, etc...

like image 66
niefpaarschoenen Avatar answered Oct 05 '22 02:10

niefpaarschoenen