I am trying to remove non-printable character (for e.g. ^@
) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time.
I tried using
sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME
but still the ^@
characters are not removed.
Also I tried using
awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE
but it also did not help.
Can anybody suggest some alternative way to remove non-printable characters?
Used tr -cd
but it is removing accented characters. But they are required in the file.
Method 1: Using CLEAN Function The CLEAN Function can remove non-printable characters 0-31 in ASCII or Unicode. If you want to remove numbers beyond that range the SUBSTITUTE Function is a handy alternative. Step 1: Click on any cell (D3). Enter Formula =CLEAN(C3).
[3] On BSD, pipe the ls -q output through cat -v or od -c ( 25.7 ) to see what the non-printing characters are. This shows that the non-printing characters have octal values 13 and 14, respectively. If you look up these values in an ASCII table ( 51.3 ) , you will see that they correspond to CTRL-k and CTRL-l.
Use . replace() method to replace the Non-ASCII characters with the empty string.
Perhaps you could go with the complement of [:print:]
, which contains all printable characters:
tr -cd '[:print:]' < file > newfile
If your version of tr
doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):
sed 's/[^[:print:]]//g' file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With