I have this code to split CamelCase by regular expression:
Regex.Replace(input, "(?<=[a-z])([A-Z])", " $1", RegexOptions.Compiled).Trim();
However, it doesn't split this correctly: ShowXYZColours
It produces Show XYZColours
instead of Show XYZ Colours
How do I get the desired result?
.NET DEMO
You can use something like this :
(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])
CODE :
string strRegex = @"(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = @"ShowXYZColours";
string strReplace = @" $1$2";
return myRegex.Replace(strTargetString, strReplace);
OUTPUT :
Show XYZ Colours
Demo and Explanation
using Tomalak's regex with .NET System.Text.RegularExpressions creates an empty entry in position 0 of the resulting array:
Regex.Split("ShowXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")
{string[4]}
[0]: ""
[1]: "Show"
[2]: "XYZ"
[3]: "Colors"
It works for caMelCase though (as opposed to PascalCase):
Regex.Split("showXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")
{string[3]}
[0]: "show"
[1]: "XYZ"
[2]: "Colors"
Unicode-aware
(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})
Breakdown:
(?= # look-ahead: a position followed by... \p{Lu}\p{Ll} # an uppercase and a lowercase ) # | # or (?<= # look-behind: a position after... \p{Ll} # an uppercase ) # (?= # look-ahead: a position followed by... \p{Lu} # a lowercase ) #
Use with your regex split function.
EDIT: Of course you can replace \p{Lu}
with [A-Z]
and \p{Ll}
with [a-z]
if that's what you need or your regex engine does not understand Unicode categories.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With