Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print differences in CSV files by comparing values in column (using awk)

Tags:

bash

shell

awk

Say I have 2 files - file1.csv and file2.csv. I need to compare column 2 of both the files (string values) and print out the rows in file2.csv for the values of its column 3 that are not present in the column 3 of file1.csv.

I've tried using the following awk command:

awk -F'\t''NR==FNR{c[$3]++;next};c[$3] == 0' file1.csv file2.csv

This however just gives me all of file2.csv. There are only 2 extra rows in file2.csv that are not present in file1.csv.

Could someone tell me what it is I'm doing wrong?

Snippet of file1.csv (Columns are numbered from 0)

ANR     26545   CallExpression                  mutex_unlock ( & mmc_test_lock )
ANR     26546   Callee                          mutex_unlock
ANR     26547   Identifier                      mutex_unlock
ANR     26548   ArgumentList                    & mmc_test_lock
ANR     26549   Argument                        & mmc_test_lock
ANR     26550   UnaryOperationExpression        & mmc_test_lock
ANR     26551   UnaryOperator                   &
ANR     26552   Identifier                      mmc_test_lock
ANR     26553   ExpressionStatement             "__free_pages ( test -> highmem , BUFFER_ORDER )"
ANR     26554   CallExpression                  "__free_pages ( test -> highmem , BUFFER_ORDER )" 
ANR     26555   Callee                          __free_pages 
ANR     26556   Identifier                      __free_pages
ANR     26557   ArgumentList                    test -> highmem
ANR     26558   Argument                        test -> highmem 
ANR     26559   PtrMemberAccess                 test -> highmem
ANR     26560   Identifier                      test
ANR     26561   Identifier                      highmem
ANR     26562   Argument                        BUFFER_ORDER
ANR     26563   Identifier                      BUFFER_ORDER 

Snippet of file2.csv

ANR     12910   CallExpression                  mutex_unlock ( & mmc_test_lock )
ANR     12911   Callee                          mutex_unlock
ANR     12912   Identifier                      mutex_unlock
ANR     12913   ArgumentList                    & mmc_test_lock
ANR     12914   Argument                        & mmc_test_lock
ANR     12915   UnaryOperationExpression        & mmc_test_lock
ANR     12916   UnaryOperator                   & 
ANR     12917   Identifier                      mmc_test_lock 
ANR     12918   IfStatement                     if ( test -> highmem )
ANR     12919   Condition                       test -> highmem 
ANR     12920   PtrMemberAccess                 test -> highmem
ANR     12921   Identifier                      test
ANR     12922   Identifier                      highmem
ANR     12923   ExpressionStatement             "__free_pages ( test -> highmem , BUFFER_ORDER )"
ANR     12924   CallExpression                  "__free_pages ( test -> highmem , BUFFER_ORDER )" 
ANR     12925   Callee                          __free_pages
ANR     12926   Identifier                      __free_pages
ANR     12927   ArgumentList                    test -> highmem
ANR     12928   Argument                        test -> highmem
ANR     12929   PtrMemberAccess                 test -> highmem
ANR     12930   Identifier                      test
ANR     12931   Identifier                      highmem
ANR     12932   Argument                        BUFFER_ORDER
ANR     12933   Identifier                      BUFFER_ORDER

Expected output:

ANR     12918   IfStatement     if ( test -> highmem )
ANR     12919   Condition       test -> highmem 
like image 772
AnnaR Avatar asked Feb 01 '26 00:02

AnnaR


1 Answers

You need to change your awk command to this:

awk -F'\t' 'NR==FNR {seen[$2]; next} !($2 in seen)' file1.csv file2.csv
like image 81
anubhava Avatar answered Feb 03 '26 16:02

anubhava



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!