Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing non-printables with their vi editor code using sed

Tags:

regex

sed

csh

Hi Everyone: I'm having an issue with the sed program

Problem:

I'm a CS student just learning Unix and I've been tasked to replace the non-printing character \x00 to \x1F NUL to US with their Vi editor equivalent notation. For example where ever there is a BEL character (\x07) I replace that with ^G.

The file (called input3) I have to convert contains the following:

:Control-R:
:Escape:   
:Control-T:
:Control-_:

My teacher place the non-printables on either side of the colons. My solution has to use the Unix Utilities in particular sed.

My Solution:

So I used the following sed command to do such a task just for the Control-T guy for starters:

cat input3 | sed 's/\024/^T/g' 

But it doesn't work it just send the same file to standard output. Is there something wrong with my sed command. My locale is POSIX and I'm using the C-shell. This has to be done using the C-shell.

like image 249
TwilightSparkleTheGeek Avatar asked Mar 27 '26 16:03

TwilightSparkleTheGeek


2 Answers

The pattern you used was not what you thought it was: \024 is evaluated as simply the string "024". If you look at the sed escapes references posted by @Alex, there is no special treatment for \0, so \0 becomes simply "0", and of course 24 remains "24". For example:

$ echo hello 024 joe | sed 's/\024/^T/g'
hello  joe

So, since you want to replace the character with octal value 024, you have to use the right format for octal values, as @Alex already wrote:

cat input3 | sed 's/\o024/^T/g'

You could use hexa values too, if that's easier:

cat input3 | sed 's/\x14/^T/g'

(that's not a typo, 024 converted to hexa is 0x14)

Based on the sed reference above, there's an even more readable version:

cat input3 | sed 's/\ct/^T/g'

That is, you can use \c to match Control-X where X is any character. This works nicely for Control-T, Control-R, Control-_ in your example input, but it won't work for Control-ESCAPE, because there is no ASCII character for ESCAPE (and \c[ doesn't work). For that you really need to use the octal or hexa representation of Control-ESCAPE.

Extra tip: you can use hexdump to find the hexa codes of your input, for example:

$ hexdump -C input3
00000000  3a 43 6f 6e 74 72 6f 6c  2d 52 3a 12 0a 3a 45 73  |:Control-R:..:Es|
00000010  63 61 70 65 3a 1b 0a 3a  43 6f 6e 74 72 6f 6c 2d  |cape:..:Control-|
00000020  54 3a 14 0a 3a 43 6f 6e  74 72 6f 6c 2d 5f 3a 1f  |T:..:Control-_:.|
00000030  0a  

So, to replace Control-ESCAPE:

cat input3 | sed 's/\x1b/^[/'

Finally, to replace multiple patterns with one sed command, you'll need to separate the s/// commands by ;, or by using multiple -e flags, for example these both work:

cat input3 | sed 's/\ct/^T/;s/\cr/^R/'
cat input3 | sed -e 's/\ct/^T/' -e 's/\cr/^R/'

Using multiple -e is more portable, as it works in older versions of sed too.

like image 123
janos Avatar answered Apr 01 '26 06:04

janos


I think you are missing escaping the octal value 024.

Try this instead:

cat input3 | sed 's/\o024/^T/g'

You may find this sed escapes reference useful.

like image 27
Stephan Avatar answered Apr 01 '26 07:04

Stephan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!