Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

search document for non-ascii

An application on my computer needs to read in a text file. I have several, and one doesn't work; the program fails to read it and tells me that there is a bad character in it somewhere. My first guess is that there's a non-ascii character in there somewhere, but I have no idea how to find it. Perl or any generic regex would be nice. Any ideas?

like image 829
Nate Glenn Avatar asked Jan 13 '12 02:01

Nate Glenn


3 Answers

You can use [^\x20-\x7E] to match a non-ASCII character.

e.g. grep -P '[^\x20-\x7E]' suspicious_file

like image 51
mathematical.coffee Avatar answered Oct 23 '22 13:10

mathematical.coffee


perl -wne 'printf "byte %02X in line $.\n", ord $& while s/[^\t\n\x20-\x7E]//;'

will find every character that is not an ASCII glyphic character, tab, space, or newline.

If it reports 0Ds (carriage-returns) in files that are O.K., then change \t\n to \t\n\r.

If it only reports 0Ds in files that are bad, then you can probably fix those files by running dos2unix on them.

like image 25
ruakh Avatar answered Oct 23 '22 12:10

ruakh


If you use tabulators in your source code as well, try this pattern:

[^\x08-\x7E]

Works also in Notepad++

like image 29
elwood Avatar answered Oct 23 '22 11:10

elwood