Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does awk "not in" array work just like awk "in" array?

Tags:

awk

gawk

Here's an awk script that attempts to set difference of two files based on their first column:

BEGIN{
    OFS=FS="\t"
    file = ARGV[1]
    while (getline < file)
        Contained[$1] = $1
    delete ARGV[1]
    }
$1 not in Contained{
    print $0
}

Here is TestFileA:

cat
dog
frog

Here is TestFileB:

ee
cat
dog
frog

However, when I run the following command:

gawk -f Diff.awk TestFileA TestFileB

I get the output just as if the script had contained "in":

cat
dog
frog

While I am uncertain about whether "not in" is correct syntax for my intent, I'm very curious about why it behaves exactly the same way as when I wrote "in".

like image 436
merlin2011 Avatar asked Jun 06 '12 23:06

merlin2011


People also ask

How do you declare an array in awk?

AWK has associative arrays and one of the best thing about it is – the indexes need not to be continuous set of number; you can use either string or number as an array index. Also, there is no need to declare the size of an array in advance – arrays can expand/shrink at runtime.

What is NR and FNR in awk?

NR and FNR are two built-in awk variables. NR tells us the total number of records that we've read so far, while FNR gives us the number of records we've read in the current input file.

How do you use NR in awk?

NR: NR command keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file. NF: NF command keeps a count of the number of fields within the current input record.


4 Answers

I cannot find any doc about element not in array.

Try !(element in array).


I guess: awk sees not as an uninitialized variable, so not is evaluated as an empty string.

$1 not == $1 "" == $1 
like image 112
kev Avatar answered Sep 27 '22 23:09

kev


I figured this one out. The ( x in array ) returns a value, so to do "not in array", you have to do this:

if ( x in array == 0 )
   print "x is not in the array"

or in your example:

($1 in Contained == 0){
   print $0
}
like image 44
Jeff Avatar answered Sep 28 '22 00:09

Jeff


In my solution for this problem I use the following if-else statement:

if($1 in contained);else{print "Here goes your code for \"not in\""}
like image 36
Peter Avatar answered Sep 27 '22 23:09

Peter


Not sure if this is anything like you were trying to do.

#! /bin/awk
# will read in the  second arg file and make a hash of the token
# found in column one. Then it will read the first arg file and print any
# lines with a token in column one not matching the tokens already defined
BEGIN{
    OFS=FS="\t"
    file = ARGV[1]
    while (getline  &lt file)
        Contained[$1] = $1
#    delete ARGV[1]  # I don't know what you were thinking here
#    for(i in Contained) {print Contained[i]} # debuging, not just for sadists
    close (ARGV[1])
}
{
   if ($1 in  Contained){} else { print $1 }
}

like image 27
starbolin Avatar answered Sep 27 '22 23:09

starbolin