Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search for non-ASCII characters with bash tools?

Tags:

grep

bash

unicode

I have a large text file that contains a few unicode characters that make LaTeX crash. How can I find non-ASCII characters in a file with sed, and the like in a Linux bash?

like image 264
Jonas Stein Avatar asked Nov 28 '12 01:11

Jonas Stein


People also ask

How do I find a non Unicode character?

To identify the Non Unicode characters we can use either Google Chrome or Mozilla firefox browser by just dragging and dropping the file to the browser. Chrome will show us only the row and column number of the .

How do you find non-ASCII characters in notepad?

In Notepad++, if you go to menu Search → Find characters in range → Non-ASCII Characters (128-255) you can then step through the document to each non-ASCII character. Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.

How do I find a non printable character in Unix?

[3] On BSD, pipe the ls -q output through cat -v or od -c ( 25.7 ) to see what the non-printing characters are. This shows that the non-printing characters have octal values 13 and 14, respectively. If you look up these values in an ASCII table ( 51.3 ) , you will see that they correspond to CTRL-k and CTRL-l.


2 Answers

Try:

nonascii() { LANG=C grep --color=always '[^ -~]\+'; } 

Which can be used like:

printf 'ŨTF8\n' | nonascii 

Within [] ^ means "not". So [^ -~] means characters not between space and ~. So excluding control chars, this matches non ASCII characters, and is a more portable though slightly less accurate version of [^\x00-\x7f] below. The \+ means 1 or more and will get multibye characters to have a color shown around the complete character(s), rather than interspersed in each byte, thus corrupting the multibyte sequence

like image 112
pixelbeat Avatar answered Sep 17 '22 15:09

pixelbeat


Try this command:

grep -P '[^\x00-\x7f]' file 
like image 32
kev Avatar answered Sep 21 '22 15:09

kev