I'd like to read filein.txt (tab delimited) and output a fileout.txt with only rows that match the value of a given column, and eliminate the column being queried. i.e.,
filein.txt
#name\thouse\taddress
roger\tvictorian\t223 dolan st.
maggie\tfrench\t12 alameda ave.
kingston\tvictorian\t224 house st.
robert\tamerican\t22 dolan st.
Let us say I'd like to select only the rows where the houses are of victorian
style, then my fileout.txt should look like:
fileout.txt
#name\taddress
roger\t223 dolan st.
kingston\t224 house st.
Using AWK to Filter Rows 1 Let’s look at the data we want to filter. What we want to do is get the rows from Chr (column 7) when it equals 6 and also the Pos ... 2 Printing Fields and Searching. We can also use AWK to select and print parts of the file. ... 3 Filtering Rows Based on Field Values. ...
I learned that in awk, $2 is the 2nd column. How to specify the ith line and the element at the ith row and jth column? Here's an example with a header line and (redundant) field descriptions: There are better ways to align columns than " " by the way.
Our initial problem requires that we look into the Chr field to get only lines with the value 6. Then we want to look into the Pos field to grab the lines where those values are between 11000000 and 25000000. To do this in AWK, we need to use the if control statement along with a conditional expression. Let’s run one now and explain after:
We can use at the command line in conjunction with other UNIX commands to build a pipeline of operations that act on a data file or we can use AWK inside a shell script. You can also put an AWK program in it’s own file and run with awk -f source-file. There are many more features in the AWK language I didn’t discuss in this blog.
awk -F"\t" '$2 == "victorian" { print $1"\t"$3 }' file.in
You can do it with the following awk
script:
#!/bin/bash
style="victorian"
awk -v s_style=$style 'BEGIN{FS=OFS="\t"}
$2==s_style {$2=""; sub("\t\t","\t"); print}'
Explanation:
style="victorian"
: assign the house style that you want to select outside of the awk
script so it's easier to maintainawk
: invoke awk-v s_style=$style
: the -v
option passes an external variable into awk. Need to specify this for each variable you pass in. In this case it assigns the external variable $style
to the awk variable s_style
.BEGIN{FS=OFS="\t"}
: tells awk that the field separators in the output should be tabs, not spaces by default.{$2==s_style {$2=""; sub("\t\t","\t"); print}}'
: If the 2nd field is the house type specified in s_style
(in this case, victorian
), then remove it and print the line.Alternatively, you could do:
#!/bin/bash
style="victorian"
awk -v s_style=$style 'BEGIN{FS=OFS="\t"}
$2==s_style {print $1, $3}'
but this assumes that your input files will not have additional fields separated by tabs in the future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With