Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting CamelCase with regex

I have this code to split CamelCase by regular expression:

Regex.Replace(input, "(?<=[a-z])([A-Z])", " $1", RegexOptions.Compiled).Trim();

However, it doesn't split this correctly: ShowXYZColours

It produces Show XYZColours instead of Show XYZ Colours

How do I get the desired result?

like image 670
Sean Avatar asked Jan 24 '14 07:01

Sean


3 Answers

.NET DEMO

You can use something like this :

(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])

CODE :

string strRegex = @"(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = @"ShowXYZColours";
string strReplace = @" $1$2";

return myRegex.Replace(strTargetString, strReplace);

OUTPUT :

Show XYZ Colours

Demo and Explanation

like image 62
Sujith PS Avatar answered Nov 17 '22 11:11

Sujith PS


using Tomalak's regex with .NET System.Text.RegularExpressions creates an empty entry in position 0 of the resulting array:

Regex.Split("ShowXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")

{string[4]}
    [0]: ""
    [1]: "Show"
    [2]: "XYZ"
    [3]: "Colors"

It works for caMelCase though (as opposed to PascalCase):

Regex.Split("showXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")

{string[3]}
    [0]: "show"
    [1]: "XYZ"
    [2]: "Colors"
like image 33
dr. rAI Avatar answered Nov 17 '22 11:11

dr. rAI


Unicode-aware

(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})

Breakdown:

(?=               # look-ahead: a position followed by...
  \p{Lu}\p{Ll}    #   an uppercase and a lowercase
)                 #
|                 # or
(?<=              # look-behind: a position after...
  \p{Ll}          #   an uppercase
)                 #
(?=               # look-ahead: a position followed by...
  \p{Lu}          #   a lowercase
)                 #

Use with your regex split function.


EDIT: Of course you can replace \p{Lu} with [A-Z] and \p{Ll} with [a-z] if that's what you need or your regex engine does not understand Unicode categories.

like image 18
Tomalak Avatar answered Nov 17 '22 12:11

Tomalak