Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep unicode 16 support

I use TextEdit on macosx created two files, same contents with different encodings, then

grep xxx filename_UTF-16

nothing

grep xxx filename_UTF-8

xxxxxxx xxxxxxyyyyyy

grep did not support UTF-16?
like image 713
toughtalker Avatar asked Jul 30 '11 08:07

toughtalker


People also ask

Is Unicode 16-bit or 24 bit?

16-bit Unicode Transformation Format (UTF-16) is a character encoding system that uses 16-bit code units to represent Unicode code points. . NET uses UTF-16 to encode the text in a string . A char instance represents a 16-bit code unit.

Does UTF-16 support all languages?

UTF-16 is space-efficient for East Asian languages (but not for ASCII or English or European languages), while it's never more space-efficient than alternative encodings, than e.g. GB 18030 which is used on the web, and supports all languages.

What is Unicode 16-bit?

16-bit Unicode or Unicode Transformation Format (UTF-16) is a method of encoding character data, capable of encoding 1,112,064 possible characters in Unicode. UTF-16 encodes characters into specific binary sequences using one or two 16-bit sequences.

Is UTF-16 same as Unicode?

Utf-8 and utf-16 are character encodings that each handle the 128,237 characters of Unicode that cover 135 modern and historical languages. Unicode is a standard and utf-8 and utf-16 are implementations of the standard. While Unicode is currently 128,237 characters it can handle up to 1,114,112 characters.


2 Answers

iconv -f UTF-16 -t UTF-8 yourfile | grep xxx
like image 82
hmontoliu Avatar answered Sep 19 '22 23:09

hmontoliu


You could always try converting first to utf-8:

iconv -f utf-16 -t utf-8 filename | grep xxxxx
like image 29
ninjalj Avatar answered Sep 19 '22 23:09

ninjalj