Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash (grep) regex performing unexpectedly

I have a text file, which contains a date in the form of dd/mm/yyyy (e.g 20/12/2012).

I am trying to use grep to parse the date and show it in the terminal, and it is successful, until I meet a certain case:

These are my test cases:

  • grep -E "\d*" returns 20/12/2012
  • grep -E "\d*/" returns 20/12/2012
  • grep -E "\d*/\d*" returns 20/12/2012
  • grep -E "\d*/\d*/" returns nothing
  • grep -E "\d+" also returns nothing

Could someone explain to me why I get this unexpected behavior?

EDIT: I get the same behavior if I substitute the " (weak quotes) for ' (strong quotes).

like image 300
NlightNFotis Avatar asked Jan 15 '13 14:01

NlightNFotis


4 Answers

The syntax you used (\d) is not recognised by Bash's Extended regex.

Use grep -P instead which uses Perl regex (PCRE). For example:

grep -P "\d+/\d+/\d+" input.txt
grep -P "\d{2}/\d{2}/\d{4}" input.txt  # more restrictive

Or, to stick with extended regex, use [0-9] in place of \d:

grep -E "[0-9]+/[0-9]+/[0-9]" input.txt
grep -E "[0-9]{2}/[0-9]{2}/[0-9]{4}" input.txt  # more restrictive
like image 64
Shawn Chin Avatar answered Nov 12 '22 00:11

Shawn Chin


You could also use -P instead of -E which allows grep to use the PCRE syntax

grep -P "\d+/\d+" file

does work too.

like image 4
peteches Avatar answered Nov 12 '22 00:11

peteches


grep and egrep/grep -E don't recognize \d. The reason your first three patterns work is because of the asterisk that makes \d optional. It is actually not found.

Use [0-9] or [[:digit:]].

like image 2
Explosion Pills Avatar answered Nov 12 '22 00:11

Explosion Pills


To help troubleshoot cases like this, the -o flag can be helpful as it shows only the matched portion of the line. With your original expressions:

grep -Eo "\d*" returns nothing - a clue that \d isn't doing what you thought it was.

grep -Eo "\d*/" returns / (twice) - confirmation that \d isn't matching while the slashes are.

As noted by others, the -P flag solves the issue by recognizing "\d", but to clarify Explosion Pills' answer, you could also use -E as follows:

grep -Eo "[[:digit:]]*/[[:digit:]]*/" returns 20/12/

EDIT: Per a comment by @shawn-chin (thanks!), --color can be used similarly to highlight the portions of the line that are matched while still showing the entire line:

grep -E --color "[[:digit:]]*/[[:digit:]]*/" returns 20/12/2012 (can't do color here, but the bold "20/12/" portion would be in color)

like image 2
David Ravetti Avatar answered Nov 12 '22 00:11

David Ravetti