Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two files using AWK? [duplicate]

Tags:

linux

bash

unix

awk

File 1 has 5 fields A B C D E, with field A is an integer-valued

File 2 has 3 fields A F G

The number of rows in File 1 is much bigger than that of File 2 (20^6 to 5000)

All the entries of A in File 1 appeared in field A in File 2

I like to merge the two files by field A and carry F and G

Desired output is A B C D E F G

Example

File 1

 A     B     C    D    E 4050 S00001 31228 3286 0 4050 S00012 31227 4251 0 4049 S00001 28342 3021 1 4048 S00001 46578 4210 0 4048 S00113 31221 4250 0 4047 S00122 31225 4249 0 4046 S00344 31322 4000 1 

File 2

A     F    G    4050 12.1 23.6 4049 14.4 47.8    4048 23.2 43.9 4047 45.5 21.6 

Desired output

A    B      C      D   E F    G 4050 S00001 31228 3286 0 12.1 23.6 4050 S00012 31227 4251 0 12.1 23.6 4049 S00001 28342 3021 1 14.4 47.8 4048 S00001 46578 4210 0 23.2 43.9 4048 S00113 31221 4250 0 23.2 43.9 4047 S00122 31225 4249 0 45.5 21.6 
like image 741
Tony Avatar asked Mar 29 '11 03:03

Tony


People also ask

How do I merge two files in awk?

The command basically says: join based on the second column of the first file ( -1 2 ), and the first column of the second file ( -2 1 ), and output the first column of the first file and the second column of the second file ( -o 1.1,2.2 ). That only shows the lines that paired.

What is awk '{ print $2 }'?

awk '{ print $2; }' prints the second field of each line. This field happens to be the process ID from the ps aux output. xargs kill -${2:-'TERM'} takes the process IDs from the selected sidekiq processes and feeds them as arguments to a kill command.

How do I merge two files in Linux?

To join two or more text files on the Linux command-line, you can use the cat command. The cat (short for “concatenate”) command is one of the most commonly used commands in Linux as well as other UNIX-like operating systems, used to concatenate files and print on the standard output.

What does $1 $2 indicate in awk file?

Awk works by scanning through each line of text (or record) in the file and carrying out any instructions you tell it on that line. In awk we access fields using syntax like: $1 or $2. $1 indicates that you are referring to the first field or first column.


2 Answers

$ awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1 4050 S00001 31228 3286 0 12.1 23.6 4050 S00012 31227 4251 0 12.1 23.6 4049 S00001 28342 3021 1 14.4 47.8 4048 S00001 46578 4210 0 23.2 43.9 4048 S00113 31221 4250 0 23.2 43.9 4047 S00122 31225 4249 0 45.5 21.6 4046 S00344 31322 4000 1 

Explanation: (Partly based on another question. A bit late though.)

FNR refers to the record number (typically the line number) in the current file and NR refers to the total record number. The operator == is a comparison operator, which returns true when the two surrounding operands are equal. So FNR==NR{commands} means that the commands inside the brackets only executed while processing the first file (file2 now).

FS refers to the field separator and $1, $2 etc. are the 1st, 2nd etc. fields in a line. a[$1]=$2 FS $3 means that a dictionary(/array) (named a) is filled with $1 key and $2 FS $3 value.

; separates the commands

next means that any other commands are ignored for the current line. (The processing continues on the next line.)

$0 is the whole line

{print $0, a[$1]} simply prints out the whole line and the value of a[$1] (if $1 is in the dictionary, otherwise only $0 is printed). Now it is only executed for the 2nd file (file1 now), because of FNR==NR{...;next}.

like image 82
kurumi Avatar answered Sep 21 '22 16:09

kurumi


Thankfully, you don't need to write this at all. Unix has a join command to do this for you.

join -1 1 -2 1 File1 File2 

Here it is "in action":

will-hartungs-computer:tmp will$ cat f1 4050 S00001 31228 3286 0 4050 S00012 31227 4251 0 4049 S00001 28342 3021 1 4048 S00001 46578 4210 0 4048 S00113 31221 4250 0 4047 S00122 31225 4249 0 4046 S00344 31322 4000 1 will-hartungs-computer:tmp will$ cat f2 4050 12.1 23.6 4049 14.4 47.8    4048 23.2 43.9 4047 45.5 21.6 will-hartungs-computer:tmp will$ join -1 1 -2 1 f1 f2 4050 S00001 31228 3286 0 12.1 23.6 4050 S00012 31227 4251 0 12.1 23.6 4049 S00001 28342 3021 1 14.4 47.8 4048 S00001 46578 4210 0 23.2 43.9 4048 S00113 31221 4250 0 23.2 43.9 4047 S00122 31225 4249 0 45.5 21.6 will-hartungs-computer:tmp will$  
like image 27
Will Hartung Avatar answered Sep 20 '22 16:09

Will Hartung