Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using multiple delimiters in awk

People also ask

Can awk use multiple field separators?

As you can see, you can combine more than one delimiter in the AWK field separator to get specific information.

How do I use NF in awk?

NF is a predefined variable whose value is the number of fields in the current record. awk automatically updates the value of NF each time it reads a record. No matter how many fields there are, the last field in a record can be represented by $NF . So, $NF is the same as $7 , which is ' example.


The delimiter can be a regular expression.

awk -F'[/=]' '{print $3 "\t" $5 "\t" $8}' file

Produces:

tc0001   tomcat7.1    demo.example.com  
tc0001   tomcat7.2    quest.example.com  
tc0001   tomcat7.5    www.example.com

Good news! awk field separator can be a regular expression. You just need to use -F"<separator1>|<separator2>|...":

awk -F"/|=" -vOFS='\t' '{print $3, $5, $NF}' file

Returns:

tc0001  tomcat7.1  demo.example.com
tc0001  tomcat7.2  quest.example.com
tc0001  tomcat7.5  www.example.com

Here:

  • -F"/|=" sets the input field separator to either / or =.

  • -vOFS='\t' is using the -v flag for setting a variable. OFS is the default variable for the Output Field Separator and it is set to the tab character. The flag is necessary because there is no built-in for the OFS like -F.

  • {print $3, $5, $NF} prints the 3rd, 5th and last fields based on the input field separator.


See another example:

$ cat file
hello#how_are_you
i#am_very#well_thank#you

This file has two fields separators, # and _. If we want to print the second field regardless of the separator being one or the other, let's make both be separators!

$ awk -F"#|_" '{print $2}' file
how
am

Where the files are numbered as follows:

hello#how_are_you           i#am_very#well_thank#you
^^^^^ ^^^ ^^^ ^^^           ^ ^^ ^^^^ ^^^^ ^^^^^ ^^^
  1    2   3   4            1  2   3    4    5    6

If your whitespace is consistent you could use that as a delimiter, also instead of inserting \t directly, you could set the output separator and it will be included automatically:

< file awk -v OFS='\t' -v FS='[/ ]' '{print $3, $5, $NF}'

Another one is to use the -F option but pass it regex to print the text between left and or right parenthesis ().

The file content:

528(smbw)
529(smbt)
530(smbn)
10115(smbs)

The command:

awk -F"[()]" '{print $2}' filename

result:

smbw
smbt
smbn
smbs

Using awk to just print the text between []:

Use awk -F'[][]' but awk -F'[[]]' will not work.

http://stanlo45.blogspot.com/2020/06/awk-multiple-field-separators.html


For a field separator of any number 2 through 5 or letter a or # or a space, where the separating character must be repeated at least 2 times and not more than 6 times, for example:

awk -F'[2-5a# ]{2,6}' ...

I am sure variations of this exist using ( ) and parameters


Perl one-liner:

perl -F'/[\/=]/' -lane 'print "$F[2]\t$F[4]\t$F[7]"' file

These command-line options are used:

  • -n loop around every line of the input file, put the line in the $_ variable, do not automatically print every line

  • -l removes newlines before processing, and adds them back in afterwards

  • -a autosplit mode – perl will automatically split input lines into the @F array. Defaults to splitting on whitespace

  • -F autosplit modifier, in this example splits on either / or =

  • -e execute the perl code

Perl is closely related to awk, however, the @F autosplit array starts at index $F[0] while awk fields start with $1.