An application on my computer needs to read in a text file. I have several, and one doesn't work; the program fails to read it and tells me that there is a bad character in it somewhere. My first guess is that there's a non-ascii character in there somewhere, but I have no idea how to find it. Perl or any generic regex would be nice. Any ideas?
You can use [^\x20-\x7E]
to match a non-ASCII character.
e.g. grep -P '[^\x20-\x7E]' suspicious_file
perl -wne 'printf "byte %02X in line $.\n", ord $& while s/[^\t\n\x20-\x7E]//;'
will find every character that is not an ASCII glyphic character, tab, space, or newline.
If it reports 0D
s (carriage-returns) in files that are O.K., then change \t\n
to \t\n\r
.
If it only reports 0D
s in files that are bad, then you can probably fix those files by running dos2unix
on them.
If you use tabulators in your source code as well, try this pattern:
[^\x08-\x7E]
Works also in Notepad++
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With