Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read files with different encodings using Awk?

How can I correctly read files in encodings other than UTF8 in Awk?

I have a file in Hebrew/Windows-1255 encoding. A simple {print $0} awk prints stuff like �. how can I make it read correctly?

like image 761
Ofri Raviv Avatar asked Nov 30 '09 15:11

Ofri Raviv


1 Answers

awk itself doesn't have any support for handling different encodings. It will honor the locale specified in the environment, but your best bet is to transcode the input to the proper encoding before handing it off to awk.

-f is the format you want to convert from, -t is the target format, and -c skips over any invalid characters which prematurely terminate iconv's operation. Of course --help will give more details.

iconv -c -f cp1255 -t utf8 somefile | awk ...
like image 106
jamessan Avatar answered Oct 22 '22 09:10

jamessan