Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What Perl regex can match CamelCase words?

Tags:

regex

perl

I am searching the following words in .todo files:

ZshTabCompletionBackward 
MacTerminalIterm

I made the following regex

[A-Z]{1}[a-z]*[A-Z]{1}[a-z]*

However, it is not enough, since it finds only the following type of words

ZshTab

In pseudo code, I am trying to make the following regex

([A-Z]{1}[a-z]*[A-Z]{1}[a-z]*){1-9}

How can you make the above regex in Perl?

like image 463
Léo Léopold Hertz 준영 Avatar asked May 02 '09 22:05

Léo Léopold Hertz 준영


2 Answers

I think you want something like this, written with the /x flag to add comments and insignificant whitespace:

/
   \b      # word boundary so you don't start in the middle of a word

   (          # open grouping
      [A-Z]      # initial uppercase
      [a-z]*     # any number of lowercase letters
   )          # end grouping

   {2,}    # quantifier: at least 2 instances, unbounded max  

   \b      # word boundary
/x

If you want it without the fancy formatting, just remove the whitespace and comments:

/\b([A-Z][a-z]*){2,}\b/

As j_random_hacker points out, this is a bit simple since it will match a word that is just consecutive capital letters. His solution, which I've expanded with /x to show some detail, ensures at least one lowercase letter:

/
    \b          # start at word boundary
    [A-Z]       # start with upper
    [a-zA-Z]*   # followed by any alpha

    (?:  # non-capturing grouping for alternation precedence
       [a-z][a-zA-Z]*[A-Z]   # next bit is lower, any zero or more, ending with upper
          |                     # or 
       [A-Z][a-zA-Z]*[a-z]   # next bit is upper, any zero or more, ending with lower
    )

    [a-zA-Z]*   # anything that's left
    \b          # end at word 
/x

If you want it without the fancy formatting, just remove the whitespace and comments:

/\b[A-Z][a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/

I explain all of these features in Learning Perl.

like image 109
brian d foy Avatar answered Oct 21 '22 00:10

brian d foy


Assuming you aren't using the regex to do extraction, and just matching...

[A-Z][a-zA-Z]*

Isn't the only real requirement that it's all letters and starts with a capital letter?

like image 23
Bill Lynch Avatar answered Oct 20 '22 23:10

Bill Lynch