Why does AWK not treat this array index as a number unless I use int()?

Q: How do you declare an array in awk?

In awk , you don't need to specify the size of an array before you start to use it. Additionally, any number or string in awk may be used as an array index, not just consecutive integers. In most other languages, you have to declare an array and specify how many elements or components it contains.

Q: How arrays are processed using awk?

AWK has associative arrays and one of the best thing about it is – the indexes need not to be continuous set of number; you can use either string or number as an array index. Also, there is no need to declare the size of an array in advance – arrays can expand/shrink at runtime.

Q: Does awk have arrays?

The awk language provides one-dimensional arrays for storing groups of related strings or numbers. Every awk array must have a name. Array names have the same syntax as variable names; any valid variable name would also be a valid array name.

Tags:

arrays

bash

awk

I have genomics files of the following type:

$ cat test-file_long.txt 
2 41647 A G
2 45895 A G
2 45953 T C
2 224919 A G
2 230055 C G
2 233239 A G
2 234130 T G
2 23454 T C

When I use the following short AWK script, it does not return all of the elements which are greater than the element used in the if statement:

{
    a[$2]
}
END{
    for (i in a){
    if(i > 45895) 
    print i
    }
}

The script returns this:

$ awk -f practice.awk test-file_long.txt 
45953

However, when I change the if statement using int(), it returns the lines that are in fact greater than, as I want:

{
    a[$2]
}
END{
    for (i in a){
    if(int(i) > 45895) 
    print i
    }
}

Result:

$ awk -f practice.awk test-file_long.txt 
233239
230055
234130
224919
45953

It appears it is only making the comparison with the first digit, and if they are the same it looks at the next digit, but it does not process the whole number. Can someone explain to me what it is about the internal mechanism of the associative array that it does not make the numeric >/< comparison unless I specify that I want the int() of the array element? What if my array elements were floats and int() was not an option?

707

asked Apr 24 '14 15:04

isosceleswheel

1 Answers

Array keys in awk are strings, so alphabetical comparison is being done here. In your first example, 459 is greater than 458 alphabetically, so it passes the test.

If your only goal is to print the lines whose 2nd column is > 45895 numerically, this would do:

awk '$2 > 45895' test-file_long.txt

Variables change type depending on the context in which they are evaluated. So by putting a variable in an explicitly numeric context, it will be treated as such. @glenn's suggestion of i+0 demonstrates this perfectly.

Alternatively, the unary plus operator +i can be used to convert an expression to a number. So your longer example could be changed to:

awk '{a[$2]} END { for (i in a) { if (+i > 45895) print i } }' test-file_long.txt

answered Oct 13 '22 13:10

Tom Fenech

Related questions
                            
                                Static arrays VS. dynamic arrays in C++11
                            
                                Java main methods with "..." arrays? [duplicate]
                            
                                C++ * vs [] as a function parameter
                            
                                Why does JavaScript's getElementsByClassName provide an object that is NOT an array?
                            
                                Initialize array of char in initialization list of constructor in C++
                            
                                Order array in every possible sequence [duplicate]
                            
                                How to get a PDO Fetch( ) to return as string
                            
                                Reference Array in C#?
                            
                                Declare huge arrays locally in C
                            
                                How can I rearrange array items moving dependencies on top?
                            
                                creating arrays of objects in javascript
                            
                                Global integer array with No dimension
                            
                                Combine two array's data using inner join
                            
                                Access nested hash element specified by an array of keys [duplicate]
                            
                                Reshaping a numpy.array in Fortran-contiguous order
                            
                                In Objective-C Check an array of Boolean Values and see if at least ONE is YES
                            
                                Combining numpy multi-dimensional arrays
                            
                                expected ‘const char *’ but argument is of type ‘char **’ in C
                            
                                push in array with auto-create in ruby
                            
                                How to get property value in js object when key is unknown

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With