Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to remove non-printable characters (junk values) from a UNIX file

I am trying to remove non-printable character (for e.g. ^@) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time. I tried using

sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME

but still the ^@ characters are not removed. Also I tried using

awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE 

but it also did not help.

Can anybody suggest some alternative way to remove non-printable characters?

Used tr -cd but it is removing accented characters. But they are required in the file.

like image 752
Pranav Avatar asked Dec 22 '15 09:12

Pranav


People also ask

How do I remove non-printable characters from a file?

Method 1: Using CLEAN Function The CLEAN Function can remove non-printable characters 0-31 in ASCII or Unicode. If you want to remove numbers beyond that range the SUBSTITUTE Function is a handy alternative. Step 1: Click on any cell (D3). Enter Formula =CLEAN(C3).

How can we view non-printable characters in a file in Unix?

[3] On BSD, pipe the ls -q output through cat -v or od -c ( 25.7 ) to see what the non-printing characters are. This shows that the non-printing characters have octal values 13 and 14, respectively. If you look up these values in an ASCII table ( 51.3 ) , you will see that they correspond to CTRL-k and CTRL-l.

How do you remove non-ascii characters?

Use . replace() method to replace the Non-ASCII characters with the empty string.


1 Answers

Perhaps you could go with the complement of [:print:], which contains all printable characters:

tr -cd '[:print:]' < file > newfile

If your version of tr doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):

sed 's/[^[:print:]]//g' file
like image 114
Tom Fenech Avatar answered Nov 15 '22 05:11

Tom Fenech