I've a pretty simple question. I've a file containing several columns and I want to filter them using awk.
So the column of interest is the 6th column and I want to find every string containing :
So per example : 20S50M is ok
I tried :
awk '{ if($6 == '/[1-100][S|M][1-100][S|M]/') print} file.txt
but it didn't work... What am I doing wrong?
In awk, regular expressions (regex) allow for dynamic and complex pattern definitions. You're not limited to searching for simple strings but also patterns within patterns.
Any awk expression is valid as an awk pattern. The pattern matches if the expression's value is nonzero (if a number) or non-null (if a string). The expression is reevaluated each time the rule is tested against a new input record.
This should do the trick:
awk '$6~/^(([1-9]|[1-9][0-9]|100)[SM]){2}$/' file
Regexplanation:
^ # Match the start of the string (([1-9]|[1-9][0-9]|100) # Match a single digit 1-9 or double digit 10-99 or 100 [SM] # Character class matching the character S or M ){2} # Repeat everything in the parens twice $ # Match the end of the string
You have quite a few issue with your statement:
awk '{ if($6 == '/[1-100][S|M][1-100][S|M]/') print} file.txt
==
is the string comparision operator. The regex comparision operator is ~
.awk
beside the script itself) and your script is missing the final (legal) single quote. [0-9]
is the character class for the digit characters, it's not a numeric range. It means match against any character in the class 0,1,2,3,4,5,6,7,8,9
not any numerical value inside the range so [1-100]
is not the regular expression for digits in the numerical range 1 - 100 it would match either a 1 or a 0. [SM]
is equivalent to (S|M)
what you tried [S|M]
is the same as (S|\||M)
. You don't need the OR operator in a character class. Awk using the following structure condition{action}
. If the condition is True the actions in the following block {}
get executed for the current record being read. The condition in my solution is $6~/^(([1-9]|[1-9][0-9]|100)[SM]){2}$/
which can be read as does the sixth column match the regular expression, if True the line gets printed because if you don't get any actions then awk
will execute {print $0}
by default.
Regexes cannot check for numeric values. "A number from 1 to 100" is outside what regexes can do. What you can do is check for "1-3 digits."
You want something like this
/\d{1,3}[SM]\d{1,3}[SM]/
Note that the character class [SM]
doesn't have the !
alternation character. You would only need that if you were writing it as (S|M)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With