Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why I can't use as delimiter in awk the string "?B?"

Tags:

awk

By running the following I am getting as a result the string "utf-8" I thought that with this command I would had string "tralala" returned

echo "=?utf-8?B?tralala" | awk -F "?B?" '{print $2 }'

Why is that? What delimiter should I use in order to get the string "tralala" ?

like image 909
kgroutsis Avatar asked Mar 17 '23 17:03

kgroutsis


2 Answers

? is a regex metacharacter that means zero or one matches of the preceding atom. (I'm surprised awk didn't complain about the one at the start but .)

Try echo "=?utf-8?B?tralala" | awk -F '\\?B\\?' '{print $2 }' instead.

like image 156
Etan Reisner Avatar answered Apr 25 '23 02:04

Etan Reisner


Awk delimiters are NOT strings, they are "Field Separators" (hence the variable named FS) which are a type of Extended Regular Expression with some additional features (e.g. a single blank char as the field separator when not inside square brackets means separate by all chains of contiguous white space and ignore leading and trailing white space on each record).

The difference between a string, a regular expression, and a field separator are very important to be aware of. You sometimes also see the word "pattern" used - do not use that term, it has no (or too many possible) meaning.

A ? is an RE metacharacter so you need to tell awk not to treat it as such in your case by either of these methods:

$ echo "=?utf-8?B?tralala" | awk -F '[?]B[?]' '{print $2}'
tralala
$ echo "=?utf-8?B?tralala" | awk -F '\\?B\\?' '{print $2}'
tralala

You don't strictly need to do that for the first ? as it's metacharacter functionality is not applicable when it's the first char in an RE:

$ echo "=?utf-8?B?tralala" | awk -F '?B[?]' '{print $2}'
tralala
$ echo "=?utf-8?B?tralala" | awk -F '?B\\?' '{print $2}'
tralala

but IMHO it's best to do it anyway for clarity and future-proofing.

like image 33
Ed Morton Avatar answered Apr 25 '23 03:04

Ed Morton