I want to split camelCase
or PascalCase
words to space separate collection of words.
So far, I have:
Regex.Replace(value, @"(\B[A-Z]+?(?=[A-Z][^A-Z])|\B[A-Z]+?(?=[^A-Z]))", " $0", RegexOptions.Compiled);
It works fine for converting "TestWord" to "Test Word" and for leaving single words untouched, e.g. Testing
remains Testing
.
However, ABCTest
gets converted to A B C Test
when I would prefer ABC Test
.
Approach: Use str. replace() method to replace the first character of string into lower case and other characters after space will be into upper case. The toUpperCase() and toLowerCase() methods are used to convert the string character into upper case and lower case respectively.
A string can be converted into either the lower or upper camel case convention just by removing the spaces from the string. Lower Camel Case Example: Input: JavaTpoint is the best tutorial site for programming languages. Output: javaTpointIsTheBestTutorialSiteForProgrammingLanguages.
Try:
[A-Z][a-z]+|[A-Z]+(?=[A-Z][a-z])|[a-z]+|[A-Z]+
An example on Regex101
string strText = " TestWord asdfDasdf ABCDef";
string[] matches = Regex.Matches(strText, @"[A-Z][a-z]+|[A-Z]+(?=[A-Z][a-z])|[a-z]+|[A-Z]+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
string result = String.Join(" ", matches);
result
= 'Test Word asdf Dasdf ABC Def'
In the example string:
TestWord qwerDasdf
ABCTest Testing ((*&^%$CamelCase!"£$%^^))
asdfAasdf
AaBbbCD
[A-Z][a-z]+
matches:
Test
Word
Dasdf
Test
Testing
Camel
Case
Aasdf
Aa
Bbb
[A-Z]+(?=[A-Z][a-z])
matches:
ABC
[a-z]+
matches:
qwer
asdf
[A-Z]+
matches:
CD
Here is my attempt:
(?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b)|(?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b)
This regex can be used with Regex.Replace
and $0
as a replacement string.
Regex.Replace(value, @"(?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b)|(?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b)", " $0", RegexOptions.Compiled);
See demo
Regex Explanation:
(?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b)
- first alternative that matches several uppercase letters that are not preceded with a start of string, word boundary or another uppercase letter, and that are followed by a lowercase letter or a word boundary, (?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b)
- the second alternative that matches a single capital letter that is not preceded with a start of string with optional uppercase letters right after, or word boundary and is followed by a lowercase letter or a word boundary that is not preceded by optional uppercase letters.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With