Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify and remove specific hidden characters from text file

Tags:

bash

unix

sed

I have a text file that contains several hidden characters. Using cat -v I am able to see that they include the following;

^M

^[[A

There are also \n characters at the end of the line. I would like to be able to display these as well somehow.

Then I would like to be able to selectively cut and sed these hidden characters. How would I go able accomplishing this?

I've tried dos2unix but that didn't help remove any of the ^M characters. I've also tried sed s/^M//g wherein I pressed ctrl+v m.


Raw data

Output from cat -v on the raw data, also available at: http://pastebin.com/Vk2i81JC

^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
^MFinished

Output wanted

Also available at: http://pastebin.com/wfDnrELm

rescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
Finished
like image 989
p014k Avatar asked Sep 11 '14 03:09

p014k


1 Answers

Try the below tr command which is used to translate or delete characters. The below command removes all the characters other than the one specified in octal within the quotes

octal \12 - new line(\n), octal \11 - TAB(^I), octal \40-\176 - are good characters.

For a complete reference of octal values refer to this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html

tr -cd '\11\12\40-\176' < org.txt > new.txt

The file new.txt will contain the characters removed.

To remove the characters between ^M and remove the unnecessary control characters use the below command

sed "s/\r.*\r//g" org.txt | tr -cd '\11\12\40-\176' > new.txt
like image 144
Ram Avatar answered Nov 15 '22 03:11

Ram