Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grep pattern matching lower case string enclosed in double quotes

Tags:

string

regex

grep

I'm having a bit of an issue with grep that I can't seem to figure out. I'm trying to search for all instances of lower case words enclosed in double quotes (C strings) in a set of source files. Using bash and gnu grep:

grep -e '"[a-z]+"' *.cpp

gives me no matches, while

grep -e '"[a-z]*"' *.cpp

gives me matches like "Abc" which is not just lower case characters. What is the proper regular expression to match only "abc"?

like image 969
Burton Samograd Avatar asked May 10 '12 18:05

Burton Samograd


People also ask

How do you escape single quotes in grep?

These special characters, called metacharacters, also have special meaning to the system and need to be quoted or escaped. Whenever you use a grep regular expression at the command prompt, surround it with quotes, or escape metacharacters (such as & ! . * $ ? and \ ) with a backslash ( \ ).

What grep symbol would you use to match a single character?

If you include special characters in patterns typed on the command line, escape them by enclosing them in single quotation marks to prevent inadvertent misinterpretation by the shell or command interpreter. To match a character that is special to grep –E, put a backslash ( \ ) in front of the character.

How do you grep forward slash?

The forward slash is not a special character in grep, but may be in tools like sed, Ruby, or Perl. You probably want to escape your literal periods, though, and it does no harm to escape the slash. This should work in all cases: \.

Which character would be used in the grep filter to specify a pattern which consists of any one of a set of characters?

To use grep as a filter, you must pipe the output of the command through grep . The symbol for pipe is “ | ”. The following example displays files that end in “ .


1 Answers

You're forgetting to escape the meta characters.

grep -e '"[a-z]\+"'

For the second part, the reason it is matching multi-case characters is because of your locale. As follows:

$ echo '"Abc"' | grep -e '"[a-z]\+"'
"Abc"
$ export LC_ALL=C
$ echo '"Abc"' | grep -e '"[a-z]\+"'
$

To get the "ascii-like" behavior, you need to set your locale to "C", as specified in the grep man page:

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

like image 58
Don Stewart Avatar answered Nov 08 '22 13:11

Don Stewart