I am attempting to use Regex with the grep command in the linux terminal in order to filter lines in a text file that start with Capital letter and end with a positive integer. Is there a way to modify my command so that it does this all in one line with one call of grep instead of two? I am using windows subsystem for linux and the microsoft store ubuntu.
Text File:
C line 1
c line 2
B line 3
d line 4
E line five
The command that I have gotten to work:
grep ^[A-Z] cap*| grep [0-9]$ cap*
The Output
C line 1
B line 3
This works but i feel like the regex statement could be combined somehow but
grep ^[A-Z][0-9]$ 
does not yield the same result as the command above.
You need to use
grep '^[A-Z].*[0-9]$'
grep '^[[:upper:]].*[0-9]$'
See the online demo. The regex matches:
^ - start of string[A-Z] / [[:upper:]] - an uppercase letter.* - any zero or more chars ([^0-9]* matches zero or more non-digit chars)[0-9] - a digit.$ - end of string.Also, if you want to make sure there is no - before the number at the end of string, you need to use a negated bracket expression, like
grep -E '^[[:upper:]](.*[^-0-9])?[1-9][0-9]*$'
Here, the POSIX ERE regx (due to -E option) matches
^[[:upper:]] - an uppercase letter at the start and then(.*[^-0-9])? - an optional occurrence of any text and then any char other than a digit and -[1-9] - a non-zero digit[0-9]* - zero or more digits$ - end of string.When you use a pipeline, you want the second grep to act on standard input, not on the file you originally grepped from.
grep ^[A-Z] cap*| grep [0-9]$
However, you need to expand the second regex if you want to exclude negative numbers. Anyway, a better solution altogether might be to switch to Awk:
awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0' cap*
The output format will be slightly different than from grep; if you want to include the name of the matching file, you have to specify that separately:
awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0 { print FILENAME ":" $0 }' cap*
The regex ^[A-Z][0-9]$ matches exactly two characters, the first of which must be an alphabetic, and the second one has to be a number.  If you want to permit arbitrary text between them, that would be ^[A-Z].*[0-9]$ (and for less arbitrary, use something a bit more specific than .*, like (.*[^-0-9])? perhaps, where you need grep -E for the parentheses and the question mark for optional, or backslashes before each of these for the BRE regex dialect you get out of the box with POSIX grep).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With