Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grep does not show results, online regex tester does

Tags:

regex

grep

I am fairly unexperienced with the behavior of grep. I have a bunch of XML files that contain lines like these:

<identifier type="abc">abc:def.ghi/g1234.ab012345</identifier>
<identifier type="abc">abc:def.ghi/g5678m.ab678901</identifier>

I wanted to get the identifier part after the slash and constructed a regex using RegexPal:

[a-z]\d{4}[a-z]*\.[a-z]*\d*

It highlights everything that I wanted. Perfect. Now when I run grep on the very same file, I don't get any results. And as I said, I really don't know much about grep, so I tried all different combinations.

grep [a-z]\d{4}[a-z]*\.[a-z]*\d* test.xml
grep "[a-z]\d{4}[a-z]*\.[a-z]*\d*" test.xml
egrep "[a-z]\d{4}[a-z]*\.[a-z]*\d*" test.xml
grep '[a-z]\d{4}[a-z]*\.[a-z]*\d*' test.xml
grep -E '[a-z]\d{4}[a-z]*\.[a-z]*\d*' test.xml

What am I doing wrong?

like image 686
slhck Avatar asked Nov 16 '10 09:11

slhck


2 Answers

Your regex doesn't match the input. Let's break it down:

  • [a-z] matches g
  • \d{4} matches 1234
  • [a-z]* doesn't match .

Also, I believe grep and family don't like the \d syntax. Try either [0-9] or [:digit:]

Finally, when using regular expressions, prefer egrep to grep. I don't remember the exact details, but egrep supports more regex operators. Also, in many shells (including bash on OS X as you mentioned, use single quotes instead of double quotes, otherwise * will be expanded by the shell to a list of files in the current directory before grep sees it (and other shell meta-characters will get expanded too). Bash won't touch anything in single quotes.

like image 69
Jon Avatar answered Sep 24 '22 21:09

Jon


grep doesn't support \d by defaul. To match a digit, use [0-9], or allow Perl compatible regular expressions:

$ grep -P "[a-z]\d{4}[a-z]*\.[a-z]*\d*" test.xml

or:

$ egrep "[a-z][0-9]{4}[a-z]*\.[a-z]*[0-9]*" test.xml
like image 31
Kobi Avatar answered Sep 25 '22 21:09

Kobi