Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join two files by matching a specific column

Tags:

linux

join

sed

awk

I'm trying to join two files which are already sorted

File1

70 CBLB Cbl proto-oncogene B
70 HOXC11 centrosomal protein 57
70 CHD4 chromodomain helicase
70 FANCF FA complementation
70 LUZP2 leucine zipper protein 2

File2

0.700140820757797 ELAVL1
0.700229616476825 HOXC11
0.700328646327188 CHD4
0.700328951649384 LUZP2

Output

Gene Symbol  Gene Description         Target Score mirDB   Target Score Diana
HOXC11       centrosomal protein 57   70                   0.700229616476825
CHD4         chromodomain helicase    70                   0.700328646327188
LUZP2        leucine zipper protein 2 70                   0.700328951649384

To perform this task, I have tried with this script, but it returns an empty file

join -j 2 -o 1.1,1.2,1.3,1.4,2.4 File1 File2 | column -t | sed '1i Gene Symbol, Gene 
Description, Target Score mirDB, Target Score Diana' > Output

Any help with awk or join commands requested.

like image 652
Diego Munoz Avatar asked Jan 20 '26 22:01

Diego Munoz


1 Answers

You can try this awk

$ awk 'BEGIN {OFS="\t"; print "Gene Symbol", "Gene Description", "Target Score mirDB", "Target Score Diana"} NR==FNR{array[$2]=$1; next} $0!~array[$2]{print $2,OFS $3" "$4" "$5,$6, $1,OFS array[$2]}' file2 file1

Gene Symbol     Gene Description        Target Score mirDB      Target Score Diana
HOX11           centrosomal protein 57          70              0.700229616476825
CHD4            chromodomain helicase           70              0.700328646327188
LUZP2           leucine zipper protein  2       70              0.700328951649384
BEGIN {
    OFS="\t" 
    print "Gene Symbol", "Gene Description", "Target Score mirDB", "Target Score Diana"
} NR==FNR {
    array[$2]=$1
    next
} $0!~array[$2] {
    print $2,OFS $3" "$4" "$5,$6, $1,OFS array[$2]
}
like image 190
HatLess Avatar answered Jan 23 '26 14:01

HatLess