I have this code to split CamelCase by regular expression:
Regex.Replace(input, "(?<=[a-z])([A-Z])", " $1", RegexOptions.Compiled).Trim();
However, it doesn't split this correctly: ShowXYZColours
It produces Show XYZColours instead of Show XYZ Colours
How do I get the desired result?
.NET DEMO
You can use something like this :
(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])
CODE :
string strRegex = @"(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = @"ShowXYZColours";
string strReplace = @" $1$2";
return myRegex.Replace(strTargetString, strReplace);
OUTPUT :
Show XYZ Colours
Demo and Explanation
using Tomalak's regex with .NET System.Text.RegularExpressions creates an empty entry in position 0 of the resulting array:
Regex.Split("ShowXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")
{string[4]}
    [0]: ""
    [1]: "Show"
    [2]: "XYZ"
    [3]: "Colors"
It works for caMelCase though (as opposed to PascalCase):
Regex.Split("showXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")
{string[3]}
    [0]: "show"
    [1]: "XYZ"
    [2]: "Colors"
                        Unicode-aware
(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})
Breakdown:
(?=               # look-ahead: a position followed by...
  \p{Lu}\p{Ll}    #   an uppercase and a lowercase
)                 #
|                 # or
(?<=              # look-behind: a position after...
  \p{Ll}          #   an uppercase
)                 #
(?=               # look-ahead: a position followed by...
  \p{Lu}          #   a lowercase
)                 #
Use with your regex split function.
EDIT: Of course you can replace \p{Lu} with [A-Z] and \p{Ll} with [a-z] if that's what you need or your regex engine does not understand Unicode categories.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With