I have a files with the contents:
aaa.bbb.ccc ddd.eee.fff.ggg h.i.j.k
If i use the code:
awk '{sub(/\.$/, ""); print $1}' test.txt
returns aaa.bbb.ccc
awk '{sub(/\.$/, ""); print $3}' test.txt
Returns: h.i.j.k
I understand the sub function is used as: sub(regexp, replacement, target)
I dont understand this part .$/, from the sub function. what is the .$?
thanks
UPDATE
Ok, i like your way of explaining things - thank you!
If i apply this to a real example,
/usr/bin/host 172.0.0.10
01.0.0.172.in-addr.arpa domain name pointer hostname.domain.com.
/usr/bin/host 172.0.0.10 | /bin/awk '{sub(/.$/, ""); print $5}' gives: hostname.domain.com
/usr/bin/host 172.0.0.10| /bin/awk '{sub(/.$/, ""); print $1}' gives: 10.0.0.172.in-addr.arpa
-The sub function will match to the end of the line as there is a "." -what is the "" doing? -I dont understand how awk is splitting things into columns?
sub(/regexp/, replacement, target)
sub(/\.$/, replacement, target)
Your regexp is \.$
, not .$/
\
is the escape character. It escapes the character that follows it, thus stripping it from the regex
meaning and processing it literally.
.
in regex
matches any single character. Unless it's escaped by \
like in your example, thus it just matches the dot character .
$
simply means the end of the line.
Putting this together, \.$
is an escaped dot at the end of the line. This would match for example any end of paragraph that ends in a period.
In your example, the sub
doesn't substitute anything because there is no .
at the end of the line (your input ends with .k
. So your first awk
just prints the 1st column, and the other one prints the 3rd column.
Update
For your updated question.
Awk splits a string in columns by whitespace by default. Thus in your input, columns are like this:
01.0.0.172.in-addr.arpa domain name pointer hostname.domain.com.
|----------$1-----------|--$2--|-$3-|--$4---|----------$5--------|
in your sub
command, awk finds the dot at the end of the line and replaces with ""
which is the empty string (i.e. it just deletes it)
So your 1st command - {sub(/.$/, ""); print $5}
, it prints the 5th column which is hostname.domain.com.
after it replaces the .
at the end with nothing (deletes it). It's worth noting that in this regex you don't escape the .
anymore, so the pattern just matches any character at the end and deletes it (it happens to be a .
in your input)
Your other command - {sub(/.$/, ""); print $1}
deletes the character at the very end of the line and then just prints the first column 10.0.0.172.in-addr.arpa
You can also set custom column separators in awk, I recommend you read some introduction and tutorials on awk to have a better understanding of how it works. E.g. simple awk tutorial
sub(regexp, replacement, target)
So here we used the regex as \.$
, which matches the dot at the end. Here sub(/\.$/, "")
we didn't mention the target so it takes $0
ie the whole line. If you specify any target , it would remove the last dot only on that particular column.
awk '{sub(/\.$/, ""); print $1}' test.txt
Removes a dot which was present only at the end of the line and prints only the column 1. If there is no dot at the last, then replacement won't occur.
awk '{sub(/\.$/, ""); print $3}' test.txt
Removes the dot at the end of the line and prints only the column 3. Because of there is no dot at the end, it returns the third column aka last column as it is.
Example:
$ cat file
aaa.bbb.ccc. ddd.eee.fff.ggg h.i.j.k.
$ awk '{sub(/\.$/, ""); print $1}' file
aaa.bbb.ccc.
$ awk '{sub(/\.$/, ""); print $3}' file
h.i.j.k
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With