By running the following I am getting as a result the string "utf-8" I thought that with this command I would had string "tralala" returned
echo "=?utf-8?B?tralala" | awk -F "?B?" '{print $2 }'
Why is that? What delimiter should I use in order to get the string "tralala" ?
?
is a regex metacharacter that means zero or one matches
of the preceding atom. (I'm surprised awk didn't complain about the one at the start but .)
Try echo "=?utf-8?B?tralala" | awk -F '\\?B\\?' '{print $2 }'
instead.
Awk delimiters are NOT strings, they are "Field Separators" (hence the variable named FS
) which are a type of Extended Regular Expression with some additional features (e.g. a single blank char as the field separator when not inside square brackets means separate by all chains of contiguous white space and ignore leading and trailing white space on each record).
The difference between a string, a regular expression, and a field separator are very important to be aware of. You sometimes also see the word "pattern" used - do not use that term, it has no (or too many possible) meaning.
A ?
is an RE metacharacter so you need to tell awk not to treat it as such in your case by either of these methods:
$ echo "=?utf-8?B?tralala" | awk -F '[?]B[?]' '{print $2}'
tralala
$ echo "=?utf-8?B?tralala" | awk -F '\\?B\\?' '{print $2}'
tralala
You don't strictly need to do that for the first ?
as it's metacharacter functionality is not applicable when it's the first char in an RE:
$ echo "=?utf-8?B?tralala" | awk -F '?B[?]' '{print $2}'
tralala
$ echo "=?utf-8?B?tralala" | awk -F '?B\\?' '{print $2}'
tralala
but IMHO it's best to do it anyway for clarity and future-proofing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With