I have a text file that contains several hidden characters. Using cat -v
I am able to see that they include the following;
^M
^[[A
There are also \n
characters at the end of the line. I would like to be able to display these as well somehow.
Then I would like to be able to selectively cut
and sed
these hidden characters. How would I go able accomplishing this?
I've tried dos2unix
but that didn't help remove any of the ^M
characters. I've also tried sed s/^M//g
wherein I pressed ctrl+v m.
Output from cat -v
on the raw data,
also available at: http://pastebin.com/Vk2i81JC
^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued: 0 B, errsize: 0 B, current rate: 0 B/s
ipos: 0 B, errors: 0, average rate: 0 B/s
opos: 0 B, run time: 1 s, successful read: 1 s ago
^MFinished
Also available at: http://pastebin.com/wfDnrELm
rescued: 0 B, errsize: 0 B, current rate: 0 B/s
ipos: 0 B, errors: 0, average rate: 0 B/s
opos: 0 B, run time: 1 s, successful read: 1 s ago
Finished
Try the below tr
command which is used to translate or delete characters. The below command removes all the characters other than the one specified in octal within the quotes
octal \12 - new line(\n), octal \11 - TAB(^I), octal \40-\176 - are good characters.
For a complete reference of octal values refer to this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html
tr -cd '\11\12\40-\176' < org.txt > new.txt
The file new.txt
will contain the characters removed.
To remove the characters between ^M and remove the unnecessary control characters use the below command
sed "s/\r.*\r//g" org.txt | tr -cd '\11\12\40-\176' > new.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With