Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK to use multiple spaces as delimiter

Tags:

unix

awk

I am using below command to join two files using first two columns.

awk 'NR==FNR{a[$1,$2]=substr($0,3);next} ($1,$2) in a{print $0, a[$1,$2] > "br0102_3.txt"}' br01.txt br02.txt

Now, by default AWk command uses whitespaces as the separators. But my file may contain single space between two words, e.g.

File 1:

ABCD               TEXT1 TEXT2                     123123112312312312312312312312312312
BCDEFG             TEXT3TEXT4                      133123123123123123123123123125423423
QWERT              TEXT5TEXT6                      123123123123125456678786789698758567

File 2:

ABCD               TEXT1 TEXT2                     12312312312312312312312312312
BCDEFG             TEXT3TEXT4                      31242342342342342342342342343
MNHT               TEXT8 TEXT9                     31242342342342342342342342343

I want the result file as ;

ABCD               TEXT1 TEXT2                     123123112312312312312312312312312312 12312312312312312312312312312
BCDEFG             TEXT3TEXT4                      133123123123123123123123123125423423 31242342342342342342342342343
QWERT              TEXT5TEXT6                      123123123123125456678786789698758567
MNHT               TEXT8 TEXT9                     31242342342342342342342342343

Any hints ?

like image 674
Apurv Avatar asked Nov 10 '14 11:11

Apurv


2 Answers

awk supports a regular expression as the value of FS so you can specify a regular expression that matches at least two spaces. Something like -F '[[:space:]][[:space:]]+'.

$ awk '{print NF}' File2
4
3
4

$ awk -F '[[:space:]][[:space:]]+' '{print NF}' File2
3
3
3
like image 73
Etan Reisner Avatar answered Oct 21 '22 17:10

Etan Reisner


You are using fixed width fields so you should be using gnu awk FIELDWIDTHS (or similar) to separate the fields, e.g. if the 2nd field is the 15 chars from char 8 to char 23 inclusive in this file:

$ cat file
abc    def ghi        klm
AAAAAAAB C D E F G H IJJJJ
abc       def ghi     klm

$ awk -v FIELDWIDTHS="7 15 4" '{print "<" $2 ">"}' file
<def ghi        >
<B C D E F G H I>
<   def ghi     >

Any solution that relies on a certain number of spaces between fields will fail when you have 1 or zero spaces between your fields.

If you want to strip leading/trailing blanks from your target field(s):

$ awk -v FIELDWIDTHS="7 15 4" '{gsub(/^\s+|\s+$/,"",$2); print "<" $2 ">"}' file
<def ghi>
<B C D E F G H I>
<def ghi>
like image 26
Ed Morton Avatar answered Oct 21 '22 16:10

Ed Morton