Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I filter lines in a text file that start with a capital letter and end with a positive integer with regex on the command line in linux?

Tags:

regex

linux

grep

I am attempting to use Regex with the grep command in the linux terminal in order to filter lines in a text file that start with Capital letter and end with a positive integer. Is there a way to modify my command so that it does this all in one line with one call of grep instead of two? I am using windows subsystem for linux and the microsoft store ubuntu.

Text File:

C line 1
c line 2
B line 3
d line 4
E line five

The command that I have gotten to work:

grep ^[A-Z] cap*| grep [0-9]$ cap*

The Output

C line 1
B line 3

This works but i feel like the regex statement could be combined somehow but

grep ^[A-Z][0-9]$ 

does not yield the same result as the command above.

like image 334
Chuck Woody Avatar asked Oct 31 '25 22:10

Chuck Woody


2 Answers

You need to use

grep '^[A-Z].*[0-9]$'
grep '^[[:upper:]].*[0-9]$'

See the online demo. The regex matches:

  • ^ - start of string
  • [A-Z] / [[:upper:]] - an uppercase letter
  • .* - any zero or more chars ([^0-9]* matches zero or more non-digit chars)
  • [0-9] - a digit.
  • $ - end of string.

Also, if you want to make sure there is no - before the number at the end of string, you need to use a negated bracket expression, like

grep -E '^[[:upper:]](.*[^-0-9])?[1-9][0-9]*$'

Here, the POSIX ERE regx (due to -E option) matches

  • ^[[:upper:]] - an uppercase letter at the start and then
  • (.*[^-0-9])? - an optional occurrence of any text and then any char other than a digit and -
  • [1-9] - a non-zero digit
  • [0-9]* - zero or more digits
  • $ - end of string.
like image 171
Wiktor Stribiżew Avatar answered Nov 03 '25 12:11

Wiktor Stribiżew


When you use a pipeline, you want the second grep to act on standard input, not on the file you originally grepped from.

grep ^[A-Z] cap*| grep [0-9]$

However, you need to expand the second regex if you want to exclude negative numbers. Anyway, a better solution altogether might be to switch to Awk:

awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0' cap*

The output format will be slightly different than from grep; if you want to include the name of the matching file, you have to specify that separately:

awk '/^[A-Z]/ && /[0-9]$/ && $NF > 0 { print FILENAME ":" $0 }' cap*

The regex ^[A-Z][0-9]$ matches exactly two characters, the first of which must be an alphabetic, and the second one has to be a number. If you want to permit arbitrary text between them, that would be ^[A-Z].*[0-9]$ (and for less arbitrary, use something a bit more specific than .*, like (.*[^-0-9])? perhaps, where you need grep -E for the parentheses and the question mark for optional, or backslashes before each of these for the BRE regex dialect you get out of the box with POSIX grep).

like image 33
tripleee Avatar answered Nov 03 '25 14:11

tripleee