If we have ip=192.168.0.1
and we call split(ip, myArray, ".")
, myArray will contains "192" at position 1, "168" at position 2, "0" at position 3 and "1" at position 4.
My question is that why does awk not interpreted the "." as the "any character" regular expression?
What would I need to do if I want to make awk interpreted the "." as the "any character" regular expression for matching?
Will this behaviour be consistent across all awk implementations?
This is really a dark corner of awk....
I had the same doubt about 5 years ago. I submitted as bug and talked to a developer of gawk, and finally got clear. It is a "feature".
Here is the ticket: https://lists.gnu.org/archive/html/bug-gawk/2013-03/msg00009.html
split(str, array, magic)
For magic
:
when you use a non-empty string (quoted by ""
) "..."
, awk will check the length of the string, if it is single char, it will be used as literal string (they call it separator). However if it is longer than 1
, it will be treated as a dynamic regex.
when you use static regex, which means, in format /.../
, no matter how long is the expression, it will be always treated as regex.
That is:
"." - literal "." (period)
"[" - literal "["
"{" - literal "{"
".*" - regex
/./ - regex
/whatever/ -regex
If you want awk to treat .(period)
as regex metacharacter, you should use split(foo,bar,/./)
But if you split by any char, you may have empty arrays, if this is what you really want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With