Below are three files -
emp.txt
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10
7698|KING|MANAGER|7839|01-MAY-81|2850||10
7782|CLARK|MANAGER|7839|09-JUN-81|2450||10
7566|JONES|MANAGER|7839|02-APR-81|2975||40
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50
7902|FORD|ANALYST|7566|03-DEC-81|3000||20
dept.txt
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON
citycode.txt
1123|NEW YORK
1124|DALLAS
1125|CHICAGO
1126|BOSTON
1127|WASHINGTON
expected output : $0(from emp.txt where $6 >= 2850)+$2(from dept.txt)+$1(from citycode.txt)
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10|ACCOUNTING|1123
7698|KING|MANAGER|7839|01-MAY-81|2850||10|ACCOUNTING|1123
7566|JONES|MANAGER|7839|02-APR-81|2975||40|OPERATIONS|1126
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50|NULL|NULL
7902|FORD|ANALYST|7566|03-DEC-81|3000||20|RESEARCH|1124
Explanation - join these files using common key of emp.txt($NF : outer join on emp.txt--record no 50 doesn't have common that is why it will print NULL) and dept.txt($1) where emp($6 >= 2850) and use common column of dept.txt($NF) and citycode.txt($1). joining two files is an easy task using common key but how can we join three files together.
Use the built-in FILENAME
variable to build the appropriate associative arrays when the city and department codes are being processed.
Then when processing the employees, output the desired data:
BEGIN {FS=OFS="|"}
function chknull(s) { return s ? s : "NULL"}
FILENAME == "citycode.txt" {citycode[$2]=$1}
FILENAME == "dept.txt" {depname[$1]=$2; depcity[$1]=$3}
FILENAME == "emp.txt" && $6 >= 2850 {
print $0, chknull(depname[$NF]), chknull(citycode[depcity[$NF]])
}
$ awk -f j3.awk citycode.txt dept.txt emp.txt
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10|ACCOUNTING|1123
7698|KING|MANAGER|7839|01-MAY-81|2850||10|ACCOUNTING|1123
7566|JONES|MANAGER|7839|02-APR-81|2975||40|OPERATIONS|1126
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50|NULL|NULL
7902|FORD|ANALYST|7566|03-DEC-81|3000||20|RESEARCH|1124
(Note that I also added the chknull
function to print "NULL" instead of empty string to match the desired output from the question, but that's not important for the meat of the question which is how to do joins with more than two files.)
awk 'BEGIN{ FS=OFS="|" }
FNR==1{ f++ }
f==1{ d[$1]=$2; c[$NF]=$1; next }
f==2 && $NF in c{ c[c[$NF]]=$1; next }
$6 >=2850{
print $0, ($NF in d ? d[$NF] : "NULL" ),( $NF in c ? c[$NF] : "NULL")
}' dept.txt citycode.txt emp.txt
Explanation
awk ' # Call awk
BEGIN{
FS=OFS="|" # Set input and output separator
}
FNR==1{ # FNR will be 1 for each file when awk reads first line
f++ # File counter
}
# use can also do FILENAME == "dept.txt" here but f==1 is easy
f==1{ # for first file dept.txt
d[$1]=$2; # set array d where index is $1 and value is $2 of file dept.txt
c[$NF]=$1; # set array c where index is last field and value is first field of file dept.txt
next # stop processing go to next line
}
# when f equal 2 that is when we read second file citycode.txt
# and last field of current file exists in array c
# there is no point in reading extra line exists in citycode.txt
# we just need whichever city exists in dept.txt
# so $NF in c
f==2 && $NF in c{
# modify array c by empid
# that is index will be the value of array c
# corresponding to last field of current file and
# array c value will be citycode
# Its because to make it easy to access array c value by empid while
# reading emp.txt file
c[c[$NF]]=$1;
# gawk user can delete array element here like below
# if you have gawk uncomment below line
# delete c[$NF]
next # stop processing go to next line
}
$6 >=2850{ # here we read emp.txt if 6th field is greater than or equal to 2850
# Print current record/row/line from emp.txt
# if last field of current file that is
# empid exists in array d then print department else NULL
# if last field of current file that is
# empid exists in array c then print citycode else NULL
print $0,($NF in d?d[$NF]:"NULL"),($NF in c?c[$NF]:"NULL")
}
' dept.txt citycode.txt emp.txt
Input
$ cat emp.txt
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10
7698|KING|MANAGER|7839|01-MAY-81|2850||10
7782|CLARK|MANAGER|7839|09-JUN-81|2450||10
7566|JONES|MANAGER|7839|02-APR-81|2975||40
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50
7902|FORD|ANALYST|7566|03-DEC-81|3000||20
$ cat dept.txt
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON
$ cat citycode.txt
1123|NEW YORK
1124|DALLAS
1125|CHICAGO
1126|BOSTON
1127|WASHINGTON
Output
$ awk 'BEGIN{FS=OFS="|"}FNR==1{f++}f==1{d[$1]=$2;c[$NF]=$1;next}f==2 && $NF in c{c[c[$NF]]=$1;next}$6 >=2850{print $0,($NF in d?d[$NF]:"NULL"),($NF in c?c[$NF]:"NULL")}' dept.txt citycode.txt emp.txt
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10|ACCOUNTING|1123
7698|KING|MANAGER|7839|01-MAY-81|2850||10|ACCOUNTING|1123
7566|JONES|MANAGER|7839|02-APR-81|2975||40|OPERATIONS|1126
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50|NULL|NULL
7902|FORD|ANALYST|7566|03-DEC-81|3000||20|RESEARCH|1124
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With