Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hex codes in sed - not behaving as expected on OSX

Tags:

macos

sed

hex

The answer to my question may exist on SO, but I have honestly looked hard and can't find it. The closest I got was this Q&A but I could not reproduce their results on my machine (OSX 10.7.5, using bash).

Here is the issue reduced to its essence: I can't get sed to interpret \xnn (e.g. \x41 for A) as hex characters. What's driving me nuts in particular is this:

echo -e '\x41' 

results in A - so the OS and its functions understand my hex code...

echo -e '\x41' | sed 's/A/B/'

results in B - as expected, since the hex code was converted to A before sed saw it

But

echo A | sed 's/\x41/B/'

results in A - I would have expected B

I have tried things like

echo A | LANG='C' sed 's/\x41/B/'

results in A

echo A | LANG='' sed 's/\x41/B/'

ditto...

echo A | sed 's/[\x41]/B/'

results in A

BUT...

echo A | sed 's/[\x41-\x41]/B/'

results in B ???

Am I being completely stupid? Or is there really something strange with sed? It can apparently interpret the hex code in a range, but I can't get it to be interpreted as a single character. What am I missing?

Please note - I am looking for answers that both explain why the above is behaving the way it is, and for ways to make it possible to insert a single hex code anywhere in a sed string, on the OSX platform. This means both in the "search", and in the "replace" part of the s/ command. Because I have obviously shown I can search for a single character with [\xnn-\xnn]; that's not the answer I am looking for.

Thanks in advance!

like image 920
Floris Avatar asked Feb 15 '13 05:02

Floris


2 Answers

There is no general concept of what "the OS and its functions understand" -- each program, function, etc understands its own particular set of metacharacters, escapes, etc. And it just happens that sed doesn't do hex codes. But bash does (if you ask it to), so you can have it translate them before calling sed with $'':

$ echo A | sed $'s/\x41/B/'
B

Note that this also interprets other escape sequences before passing them to sed, so if you want to pass any escapes to sed, you need to double-escape them, or switch quote modes so only the relevant portion is in $'':

$ echo A | sed $'s/\\(\x41\\)/B\\1/' # double-escapes for sed's escape sequences
BA
$ echo A | sed 's/\('$'\x41''\)/B\1/' # equivalent with different quote modes
BA
$ echo A | sed 's/\(A\)/B\1/' # simplest equivalent version
BA

And if you want to interpret a hex escapes in a variable, rather than constant, string, then you pretty much have to use the shell's printf builtin:

$ hex=41
$ echo A | sed "s/$(printf "\x$hex")/B/"
B
like image 167
Gordon Davisson Avatar answered Oct 17 '22 10:10

Gordon Davisson


@GordonDavisson gave me inspiration to try two more things...

First off - I suddenly wondered if I was misinterpreting the output of

echo A | sed 's/[\x41-\x41]/B/'

I assumed this meant that sed understood the \xnn codes in a range, but I was wrong. When I tried

echo A | sed 's/[\x40-\x40]/B/'

I still got an output of B , although I thought I wasn't including A (\x41) in the range any more. Clearly, sed was interpreting my range in some other way than I expected. This was resolved by looking at the man re_format page more carefully. It says

[...] all other special characters, including `\', lose their special significance within a bracket expression.

But then I got inspiration: if echo -e can expand the string, maybe I can use it to feed the string I want to sed...

echo "This?" | sed `echo -e 's/\x54\x68\x69\x73\x3F/\x59\x65\x73\x21/'`

Produces Yes!

echo "That?" | sed `echo -e 's/\x54\x68\x69\x73\x3F/\x59\x65\x73\x21/'`

Produces That?

Of course in this case the \xnn characters represent just plain ASCII - decoding the string just gives 's/This?/Yes!/' , but it does establish the principle of inserting hex characters into a string for sed. The only thing this doesn't help clear up is "what happens if your echo statement prints characters that would need to be escaped in sed. And it still doesn't address my fundamental question - "how do I insert hex characters directly into a sed string. I still suspect it is possible... even more so after reading the documentation on sed (which claims to use "old" regular expressions, although the -E flag can make it use "extended" expressions, and directs the user to the re_format man page for details; and the re_syntax page, which is referenced by re_format. Between these, it really does look like adding a hex string should work directly...

I added this information as an "answer" rather than an "edit" to my question, as I believe it begins to answer my question... Looking forward to comments!

like image 43
Floris Avatar answered Oct 17 '22 10:10

Floris