Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk select line with variable, cannot open no such file

Tags:

awk

I'm struggling with a basic awk command.

File 1 :

AB253828.1
AB253829.1
AB253830.1
AB253831.1

File 2 :

accession   accession.version   taxid   gi
A00001  A00001.1    10641   58418
A00002  A00002.1    9913    2
A00003  A00003.1    9913    3
A00004  A00004.1    32630   57971
A00005  A00005.1    32630   57972
A00006  A00006.1    32630   57973
A00008  A00008.1    32630   57974
A00009  A00009.1    32630   57975
A00010  A00010.1    32630   57976

both file have >1 000 000 lines

I would like to print columns 2 and 3 of file 2 if column 2 corresponds to the patterns of file 1 I tried a lot of possibilities but none work...

for ACC in $(cat file1.txt)
do
    #ACC1=$(echo "\"$ACC\"")
    awk -v OFS='\t'-v z="$ACC" '{ if($2 == z) { print $2,$3 } }' file2.txt
done

I got

awk: cannot open { if($2 == z) { print $2,$3 } } file2.txt (No such file or directory)

I checked, file2 is there. I suppose, my problem is the variable z but I can't find the solution.

like image 371
Marion Avatar asked Apr 20 '26 15:04

Marion


1 Answers

The immediate problem is that you are missing a space before the second -v option. (Look closely: you are setting the OFS to \t-v and then Awk thinks z="$ACC" is your actual Awk script, and looks for - and complains about the lack of - a file named ... your Awk script's contents.) But really, you want to overhaul this more thoroughly.

awk -v OFS='\t' 'NR==FNR { z[$1]++; next }
    $2 in z { print $2,$3 }' file1.txt file2.txt

This uses a common Awk idiom for reading the first file into memory, then printing out the records from the second whose second field existed as an entry in the first file. This should be orders of magnitude faster, as well as of course trivially fix the reading lines with for antipattern.

If the first file is too large to fit into memory at once, maybe partition it into smaller pieces (say 500,000 lines each?) and run this on each of those separately. It should be easy to see when Awk consumes so much memory that your system starts thrashing; at least during the first few runs, keep an eye on top or some similar monitoring tool and kill the process if it misbehaves.

like image 143
tripleee Avatar answered Apr 25 '26 13:04

tripleee



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!