Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Control Characters from a File

Tags:

linux

I want to delete all the control characters from my file using linux bash commands.

There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.

Here is what I have tried so far:

this will list all the control characters:

cat -v -e -t file.txt | head -n 10  ^A+^X$ ^A1^X$ ^D ^_$ ^E-^D$ ^E-^S$ ^E1^V$ ^F%^_$ ^F-^D$ ^F.^_$ ^F/^_$ ^F4EZ$ ^G%$ 

This will list all the control characters using grep:

$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' + 1  - - 1 % - . / 

matches the above output of cat command.

Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)

$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]' + 1  - - 1 % - . / 

here is the output in hex format:

$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2 0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04 0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04 0000040 2e06 0a1f 2f06 0a1f 0000050 

as you can see, the hex values, 0x01, 0x18 are control characters.

I tried using the tr command to delete the control characters but got an error:

$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt tr: extra operand `[:cntrl:]' Only one string may be given when deleting without squeezing repeats. Try `tr --help' for more information. 

If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?

Thanks.

like image 527
Neon Flash Avatar asked Feb 04 '13 03:02

Neon Flash


People also ask

How do I remove special characters from a text file in Linux?

For this, we have written command that will delete “#@” and “%*” from lines 2 and 3 of “newfile. txt” respectively. The sed command used in above methods will display the result only on the terminal rather than applying the changes in the text file: for that, we must use the “-i” option of sed command.

How do I find Ctrl M characters in Unix?

Note: Remember how to type control M characters in UNIX, just hold the control key and then press v and m to get the control-m character.

How do you show control characters in a text file?

Option #1 - Show All Characters Then, go to the menu and select View->Show Symbol->Show All Characters . All characters will become visible, but you will have to scroll through the whole file to see which character needs to be removed.


2 Answers

Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:

$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt 
like image 56
Kyle Barbour Avatar answered Sep 27 '22 20:09

Kyle Barbour


Based on this answer on unix.stackexchange, this should do the trick:

$ cat scriptfile.raw | col -b > scriptfile.clean 
like image 34
Stephen Boston Avatar answered Sep 27 '22 20:09

Stephen Boston