I am trying to remove non-printable character (for e.g. <code>^@</code>) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time. I tried using <pre class="prettyprint"><code>sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME </code></pre> but still the <code>^@</code> characters are not removed. Also I tried using <pre class="prettyprint"><code>awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE </code></pre> but it also did not help. Can anybody suggest some alternative way to remove non-printable characters? Used <code>tr -cd</code> but it is removing accented characters. But they are required in the file.

Perhaps you could go with the complement of <code>[:print:]</code>, which contains all printable characters: <pre class="prettyprint"><code>tr -cd '[:print:]' < file > newfile </code></pre> If your version of <code>tr</code> doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings): <pre class="prettyprint"><code>sed 's/[^[:print:]]//g' file </code></pre>

Trying to remove non-printable characters (junk values) from a UNIX file

Tags:

bash

unix

sed

awk

non-printing-characters

I am trying to remove non-printable character (for e.g. ^@) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time. I tried using

sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME

but still the ^@ characters are not removed. Also I tried using

awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE

but it also did not help.

Can anybody suggest some alternative way to remove non-printable characters?

Used tr -cd but it is removing accented characters. But they are required in the file.

752

asked Dec 22 '15 09:12

Pranav

1 Answers

Perhaps you could go with the complement of [:print:], which contains all printable characters:

tr -cd '[:print:]' < file > newfile

If your version of tr doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):

sed 's/[^[:print:]]//g' file

114

answered Nov 15 '22 05:11

Tom Fenech

Related questions
                            
                                SVN: moving files from inside the folder
                            
                                execute bash script from php without waiting
                            
                                Bash read and stderr redirection
                            
                                Executing code in if-statement (Bash)
                            
                                bash scripting de-dupe
                            
                                Error handling in bash/expect script
                            
                                Can bash omit backupfiles when completing commands?
                            
                                Limit output of all Linux commands
                            
                                Detect mouse click in bash script
                            
                                Removing part of a filename with mv
                            
                                How can I run a command only after some other commands have run successfully?
                            
                                Awk script to select files and print file sizes
                            
                                Trailing newlines and the bash 'read' builtin
                            
                                Make bash run a command right after it starts and then stay in this session?
                            
                                Why are "declare -f" and "declare -a" needed in bash scripts?
                            
                                Terminal color in Ruby
                            
                                How to conditionally add flags to shell scripts?
                            
                                Colon at the beginning of line in docker entrypoint bash script [duplicate]
                            
                                How to gzip all files in all sub-directories in bash
                            
                                Why am I getting a 'unary operator expected' error?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With