How do I grep for all non-ASCII characters?

Tags:

I have several very large XML files and I'm trying to find the lines that contain non-ASCII characters. I've tried the following:

grep -e "[\x{00FF}-\x{FFFF}]" file.xml

But this returns every line in the file, regardless of whether the line contains a character in the range specified.

Do I have the syntax wrong or am I doing something else wrong? I've also tried:

egrep "[\x{00FF}-\x{FFFF}]" file.xml

(with both single and double quotes surrounding the pattern).

618

asked Jun 08 '10 20:06

pconrey

1 Answers

You can use the command:

grep --color='auto' -P -n "[\x80-\xFF]" file.xml

This will give you the line number, and will highlight non-ascii chars in red.

In some systems, depending on your settings, the above will not work, so you can grep by the inverse

grep --color='auto' -P -n "[^\x00-\x7F]" file.xml

Note also, that the important bit is the -P flag which equates to --perl-regexp: so it will interpret your pattern as a Perl regular expression. It also says that

this is highly experimental and grep -P may warn of unimplemented features.

answered Sep 22 '22 17:09

jerrymouse

Related questions
                            
                                Regex - Does not contain certain Characters
                            
                                Regular expression to match balanced parentheses
                            
                                How to test valid UUID/GUID?
                            
                                Regular Expressions- Match Anything
                            
                                split string only on first instance of specified character
                            
                                Regular expression to search for Gadaffi [closed]
                            
                                Regex for numbers only
                            
                                Regex to check whether a string contains only numbers [duplicate]
                            
                                python .replace() regex [duplicate]
                            
                                A regular expression to exclude a word/string
                            
                                How to use regex with find command?
                            
                                Converting user input string to regular expression
                            
                                How to match "any character" in regular expression?
                            
                                Regular Expression to find a string included between two characters while EXCLUDING the delimiters
                            
                                Find and extract a number from a string
                            
                                Regular expression to match DNS hostname or IP Address?
                            
                                Greedy vs. Reluctant vs. Possessive Qualifiers
                            
                                How can I find all matches to a regular expression in Python?
                            
                                How to remove non-alphanumeric characters?
                            
                                Regex match one of two words

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I grep for all non-ASCII characters?

Tags:

regex

grep

unix

unicode

pconrey

People also ask

1 Answers

jerrymouse

Recent Activity

Donate For Us