Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match strings of files in Unix

Tags:

shell

unix

ksh

I have 3 files at a directory $FILES_DIR

1) File_Apple.txt
2) File_Samsung.txt
3) File_Huwaei.txt

Header rows of File_Apple.txt

    AAA1,BBB2,CCC3

Header rows of File_Samsung.txt

    DDD1,EEE2

Header rows of File_Huwaei.txt

    FFF1,GGG2,HHH3,III4

There's another file called head_config which contains the header line of the above 3 files.

head_config.txt

AAA1,BBB2,CCC3
DDD1,EEE2
FFF1,GGG2,HHH3,III4

Basically i have to match if the headers of the file are correct as per the head_config.txt file

I am able to do so by some cumbersome process: copying header rows of each file individually and appending to a new file. Then comparing new file created with head.config.txt

head -1 File_Apple.txt >> new_file.txt
head -1 File_Samsung.txt >> new_file.txt
head -1  File_Huwaei.txt >> new_file.txt

Then cmp new_file.txt to head_config.txt

How can i do it more efficiently?

like image 459
Novice_Techie Avatar asked Mar 03 '23 08:03

Novice_Techie


2 Answers

First, ensure that the lines in your header file are sorted alphabetically by filename. So head_config.txt becomes:

AAA1,BBB2,CCC3
FFF1,GGG2,HHH3,III4
DDD1,EEE2AAA1,BBB2,CCC3

Then execute this command:

diff head_config.txt <(head -q -n1 File_*)

If the files match, there will be no output, and $? will be 0.

like image 169
Irfan434 Avatar answered Mar 16 '23 21:03

Irfan434


Here is one command that does it all, printing Good for each file that matches and Bad for the ones that don't:

$ awk 'FNR==NR{hdr[NR]=$0;next} {print FILENAME, (hdr[++i]==$0?"Good":"Bad"); nextfile}' head_config.txt File_Apple.txt File_Samsung.txt File_Huwaei.txt 
File_Apple.txt Good
File_Samsung.txt Good
File_Huwaei.txt Good

The output is quite flexible and can be changed to meet special needs you may have.

How it works

  • NR==NR{hdr[NR]=$0;next}

    For the first file, head_config.txt, this reads each line into the array dhr.

  • print FILENAME, (hdr[++i]==$0?"Good":"Bad"); nextfile

    For each of the remaining files, this checks to see if its first line matches the corresponding element of hdr: hdr[++i]==$0. If it does the file name and Good is printed. Otherwise, the file name and Bad is printed.

like image 42
John1024 Avatar answered Mar 16 '23 22:03

John1024