Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't grep command work on text files with UTF-16 LE encoding?

Tags:

grep

utf-8

utf-16

I want to save all the lines in a text file that start with certain string in another text file. So, I used this grep command to do that:

grep '^This' input.txt > output.txt

But the output file output.txt is empty though there are lot of lines in the file input.txt which start with the word 'this'. One of my mentor suggested that the file input.txt is in UTF-16 LE format and asked me to change it into UTF-8. Then the command worked well.

Why doesn't grep command work on files with UTF-16 LE format?

like image 499
Light Yagami Avatar asked Sep 14 '25 23:09

Light Yagami


1 Answers

grep is not encoding aware. It doesn't search for "characters", it searches for bytes. Your console is sending UTF-8/ASCII encoded text (same in this case for the string "^This") to grep to search for. If the file contains UTF-16 encoded text, that won't match, since the byte representations are different.

like image 91
deceze Avatar answered Sep 16 '25 17:09

deceze