I'm having a bit of an issue with grep that I can't seem to figure out. I'm trying to search for all instances of lower case words enclosed in double quotes (C strings) in a set of source files. Using bash and gnu grep:
grep -e '"[a-z]+"' *.cpp
gives me no matches, while
grep -e '"[a-z]*"' *.cpp
gives me matches like "Abc" which is not just lower case characters. What is the proper regular expression to match only "abc"?
These special characters, called metacharacters, also have special meaning to the system and need to be quoted or escaped. Whenever you use a grep regular expression at the command prompt, surround it with quotes, or escape metacharacters (such as & ! . * $ ? and \ ) with a backslash ( \ ).
If you include special characters in patterns typed on the command line, escape them by enclosing them in single quotation marks to prevent inadvertent misinterpretation by the shell or command interpreter. To match a character that is special to grep –E, put a backslash ( \ ) in front of the character.
The forward slash is not a special character in grep, but may be in tools like sed, Ruby, or Perl. You probably want to escape your literal periods, though, and it does no harm to escape the slash. This should work in all cases: \.
To use grep as a filter, you must pipe the output of the command through grep . The symbol for pipe is “ | ”. The following example displays files that end in “ .
You're forgetting to escape the meta characters.
grep -e '"[a-z]\+"'
For the second part, the reason it is matching multi-case characters is because of your locale. As follows:
$ echo '"Abc"' | grep -e '"[a-z]\+"'
"Abc"
$ export LC_ALL=C
$ echo '"Abc"' | grep -e '"[a-z]\+"'
$
To get the "ascii-like" behavior, you need to set your locale to "C", as specified in the grep man page:
Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With