Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK sub function syntax

Tags:

regex

awk

I have a files with the contents:

aaa.bbb.ccc ddd.eee.fff.ggg h.i.j.k

If i use the code:

awk '{sub(/\.$/, ""); print $1}' test.txt
returns    aaa.bbb.ccc

awk '{sub(/\.$/, ""); print $3}' test.txt
Returns: h.i.j.k

I understand the sub function is used as: sub(regexp, replacement, target)

I dont understand this part .$/, from the sub function. what is the .$?

thanks

UPDATE

Ok, i like your way of explaining things - thank you!

If i apply this to a real example,

/usr/bin/host 172.0.0.10

01.0.0.172.in-addr.arpa domain name pointer hostname.domain.com.

  1. /usr/bin/host 172.0.0.10 | /bin/awk '{sub(/.$/, ""); print $5}' gives: hostname.domain.com

  2. /usr/bin/host 172.0.0.10| /bin/awk '{sub(/.$/, ""); print $1}' gives: 10.0.0.172.in-addr.arpa

-The sub function will match to the end of the line as there is a "." -what is the "" doing? -I dont understand how awk is splitting things into columns?

like image 654
Matzuba Avatar asked Sep 30 '14 05:09

Matzuba


2 Answers

sub(/regexp/, replacement, target)
sub(/\.$/, replacement, target)

Your regexp is \.$, not .$/

\ is the escape character. It escapes the character that follows it, thus stripping it from the regex meaning and processing it literally.

. in regex matches any single character. Unless it's escaped by \ like in your example, thus it just matches the dot character .

$ simply means the end of the line.

Putting this together, \.$ is an escaped dot at the end of the line. This would match for example any end of paragraph that ends in a period.

In your example, the sub doesn't substitute anything because there is no . at the end of the line (your input ends with .k. So your first awk just prints the 1st column, and the other one prints the 3rd column.

Update

For your updated question.

Awk splits a string in columns by whitespace by default. Thus in your input, columns are like this:

 01.0.0.172.in-addr.arpa domain name pointer hostname.domain.com.
|----------$1-----------|--$2--|-$3-|--$4---|----------$5--------|

in your sub command, awk finds the dot at the end of the line and replaces with "" which is the empty string (i.e. it just deletes it)

So your 1st command - {sub(/.$/, ""); print $5}, it prints the 5th column which is hostname.domain.com. after it replaces the . at the end with nothing (deletes it). It's worth noting that in this regex you don't escape the . anymore, so the pattern just matches any character at the end and deletes it (it happens to be a . in your input)

Your other command - {sub(/.$/, ""); print $1} deletes the character at the very end of the line and then just prints the first column 10.0.0.172.in-addr.arpa

You can also set custom column separators in awk, I recommend you read some introduction and tutorials on awk to have a better understanding of how it works. E.g. simple awk tutorial

like image 85
confused00 Avatar answered Oct 21 '22 23:10

confused00


sub(regexp, replacement, target)

So here we used the regex as \.$, which matches the dot at the end. Here sub(/\.$/, "") we didn't mention the target so it takes $0 ie the whole line. If you specify any target , it would remove the last dot only on that particular column.

awk '{sub(/\.$/, ""); print $1}' test.txt

Removes a dot which was present only at the end of the line and prints only the column 1. If there is no dot at the last, then replacement won't occur.

awk '{sub(/\.$/, ""); print $3}' test.txt

Removes the dot at the end of the line and prints only the column 3. Because of there is no dot at the end, it returns the third column aka last column as it is.

Example:

$ cat file
aaa.bbb.ccc. ddd.eee.fff.ggg h.i.j.k.
$ awk '{sub(/\.$/, ""); print $1}' file
aaa.bbb.ccc.
$ awk '{sub(/\.$/, ""); print $3}' file
h.i.j.k
like image 33
Avinash Raj Avatar answered Oct 21 '22 22:10

Avinash Raj