<p>I have a large text file that contains a few unicode characters that make LaTeX crash. How can I find non-ASCII characters in a file with sed, and the like in a Linux bash?</p>

<p>Try this command:</p> <pre class="prettyprint"><code>grep -P '[^\x00-\x7f]' file </code></pre>

How to search for non-ASCII characters with bash tools?

2 Answers

Try:

nonascii() { LANG=C grep --color=always '[^ -~]\+'; }

Which can be used like:

printf 'ŨTF8\n' | nonascii

Within [] ^ means "not". So [^ -~] means characters not between space and ~. So excluding control chars, this matches non ASCII characters, and is a more portable though slightly less accurate version of [^\x00-\x7f] below. The \+ means 1 or more and will get multibye characters to have a color shown around the complete character(s), rather than interspersed in each byte, thus corrupting the multibyte sequence

112

answered Sep 17 '22 15:09

pixelbeat

Try this command:

grep -P '[^\x00-\x7f]' file

answered Sep 21 '22 15:09

kev

Related questions
                            
                                How can I keep a docker debian container open?
                            
                                When should xargs be preferred over while-read loops?
                            
                                Build Qt Project in Debug Mode from Command line (aka bash script) in Linux
                            
                                grep backslash in negative lookbehind
                            
                                bash double bracket issue
                            
                                How to compare two decimal numbers in bash/awk?
                            
                                How to develop in Linux-Like Shell (bash) on Windows?
                            
                                Renaming multiples files with a bash loop
                            
                                .profile not working from terminal in mac
                            
                                Gsettings with cron
                            
                                A better Linux shell? [closed]
                            
                                How do you parse a filename in bash?
                            
                                grab last n lines from console output
                            
                                Search+replace strings in filenames
                            
                                Retrieving command line history
                            
                                How to get variable from text file into Bash variable
                            
                                How do I undo the command $ eval "$(docker-machine env blog)"
                            
                                How to check if two paths are equal in Bash?
                            
                                Compare two files ignoring order
                            
                                How do I write a BASH script to download and unzip file on a Mac?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to search for non-ASCII characters with bash tools?

Tags:

grep

bash

unicode

Jonas Stein

People also ask

2 Answers

pixelbeat

kev

Recent Activity

Donate For Us