The answer to my question may exist on SO, but I have honestly looked hard and can't find it. The closest I got was this Q&A but I could not reproduce their results on my machine (OSX 10.7.5, using bash
).
Here is the issue reduced to its essence: I can't get sed
to interpret \xnn
(e.g. \x41
for A
) as hex characters. What's driving me nuts in particular is this:
echo -e '\x41'
results in A
- so the OS and its functions understand my hex code...
echo -e '\x41' | sed 's/A/B/'
results in B
- as expected, since the hex code was converted to A
before sed
saw it
But
echo A | sed 's/\x41/B/'
results in A
- I would have expected B
I have tried things like
echo A | LANG='C' sed 's/\x41/B/'
results in A
echo A | LANG='' sed 's/\x41/B/'
ditto...
echo A | sed 's/[\x41]/B/'
results in A
BUT...
echo A | sed 's/[\x41-\x41]/B/'
results in B
???
Am I being completely stupid? Or is there really something strange with sed
? It can apparently interpret the hex code in a range, but I can't get it to be interpreted as a single character. What am I missing?
Please note - I am looking for answers that both explain why the above is behaving the way it is, and for ways to make it possible to insert a single hex code anywhere in a sed
string, on the OSX platform. This means both in the "search", and in the "replace" part of the s/
command. Because I have obviously shown I can search for a single character with [\xnn-\xnn]; that's not the answer I am looking for.
Thanks in advance!
There is no general concept of what "the OS and its functions understand" -- each program, function, etc understands its own particular set of metacharacters, escapes, etc. And it just happens that sed
doesn't do hex codes. But bash does (if you ask it to), so you can have it translate them before calling sed
with $''
:
$ echo A | sed $'s/\x41/B/'
B
Note that this also interprets other escape sequences before passing them to sed
, so if you want to pass any escapes to sed
, you need to double-escape them, or switch quote modes so only the relevant portion is in $''
:
$ echo A | sed $'s/\\(\x41\\)/B\\1/' # double-escapes for sed's escape sequences
BA
$ echo A | sed 's/\('$'\x41''\)/B\1/' # equivalent with different quote modes
BA
$ echo A | sed 's/\(A\)/B\1/' # simplest equivalent version
BA
And if you want to interpret a hex escapes in a variable, rather than constant, string, then you pretty much have to use the shell's printf
builtin:
$ hex=41
$ echo A | sed "s/$(printf "\x$hex")/B/"
B
@GordonDavisson gave me inspiration to try two more things...
First off - I suddenly wondered if I was misinterpreting the output of
echo A | sed 's/[\x41-\x41]/B/'
I assumed this meant that sed
understood the \xnn
codes in a range, but I was wrong. When I tried
echo A | sed 's/[\x40-\x40]/B/'
I still got an output of B
, although I thought I wasn't including A
(\x41
) in the range any more. Clearly, sed
was interpreting my range in some other way than I expected. This was resolved by looking at the man re_format
page more carefully. It says
[...] all other special characters, including `\', lose their special significance within a bracket expression.
But then I got inspiration: if echo -e
can expand the string, maybe I can use it to feed the string I want to sed
...
echo "This?" | sed `echo -e 's/\x54\x68\x69\x73\x3F/\x59\x65\x73\x21/'`
Produces Yes!
echo "That?" | sed `echo -e 's/\x54\x68\x69\x73\x3F/\x59\x65\x73\x21/'`
Produces That?
Of course in this case the \xnn
characters represent just plain ASCII - decoding the string just gives 's/This?/Yes!/'
, but it does establish the principle of inserting hex characters into a string for sed
. The only thing this doesn't help clear up is "what happens if your echo statement prints characters that would need to be escaped in sed
. And it still doesn't address my fundamental question - "how do I insert hex characters directly into a sed
string. I still suspect it is possible... even more so after reading the documentation on sed
(which claims to use "old" regular expressions, although the -E flag can make it use "extended" expressions, and directs the user to the re_format
man page for details; and the re_syntax
page, which is referenced by re_format
. Between these, it really does look like adding a hex string should work directly...
I added this information as an "answer" rather than an "edit" to my question, as I believe it begins to answer my question... Looking forward to comments!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With