Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignore existing spaces in converting CamelCase to string with spaces

Tags:

c#

.net

regex

I want to split camelCase or PascalCase words to space separate collection of words.

So far, I have:

Regex.Replace(value, @"(\B[A-Z]+?(?=[A-Z][^A-Z])|\B[A-Z]+?(?=[^A-Z]))", " $0", RegexOptions.Compiled);

It works fine for converting "TestWord" to "Test Word" and for leaving single words untouched, e.g. Testing remains Testing.

However, ABCTest gets converted to A B C Test when I would prefer ABC Test.

like image 922
Ciaran Martin Avatar asked Jun 05 '15 08:06

Ciaran Martin


People also ask

How do you string a camel case?

Approach: Use str. replace() method to replace the first character of string into lower case and other characters after space will be into upper case. The toUpperCase() and toLowerCase() methods are used to convert the string character into upper case and lower case respectively.

How do you make a Camelcase string in Java?

A string can be converted into either the lower or upper camel case convention just by removing the spaces from the string. Lower Camel Case Example: Input: JavaTpoint is the best tutorial site for programming languages. Output: javaTpointIsTheBestTutorialSiteForProgrammingLanguages.


2 Answers

Try:

[A-Z][a-z]+|[A-Z]+(?=[A-Z][a-z])|[a-z]+|[A-Z]+

An example on Regex101


How is it used in CS?

string strText = " TestWord asdfDasdf  ABCDef";
        
string[] matches = Regex.Matches(strText, @"[A-Z][a-z]+|[A-Z]+(?=[A-Z][a-z])|[a-z]+|[A-Z]+")
                .Cast<Match>()
                .Select(m => m.Value)
                .ToArray();
            
string result = String.Join(" ", matches);

result = 'Test Word asdf Dasdf ABC Def'


How it works

In the example string:

TestWord qwerDasdf
ABCTest Testing    ((*&^%$CamelCase!"£$%^^))
asdfAasdf
AaBbbCD

[A-Z][a-z]+ matches:

  • [0-4] Test
  • [4-8] Word
  • [13-18] Dasdf
  • [22-26] Test
  • [27-34] Testing
  • [45-50] Camel
  • [50-54] Case
  • [68-73] Aasdf
  • [74-76] Aa
  • [76-79] Bbb

[A-Z]+(?=[A-Z][a-z]) matches:

  • [19-22] ABC

[a-z]+ matches:

  • [9-13] qwer
  • [64-68] asdf

[A-Z]+ matches:

  • [79-81] CD
like image 184
thodic Avatar answered Nov 06 '22 00:11

thodic


Here is my attempt:

(?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b)|(?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b)

This regex can be used with Regex.Replace and $0 as a replacement string.

Regex.Replace(value, @"(?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b)|(?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b)", " $0", RegexOptions.Compiled);

See demo

Regex Explanation:

  • Contains 2 alternatives to account for a chain of capital letters before or after lowercase letters.
  • (?<!^|\b|\p{Lu})\p{Lu}+(?=\p{Ll}|\b) - first alternative that matches several uppercase letters that are not preceded with a start of string, word boundary or another uppercase letter, and that are followed by a lowercase letter or a word boundary,
  • (?<!^\p{Lu}*|\b)\p{Lu}(?=\p{Ll}|(?<!\p{Lu}*)\b) - the second alternative that matches a single capital letter that is not preceded with a start of string with optional uppercase letters right after, or word boundary and is followed by a lowercase letter or a word boundary that is not preceded by optional uppercase letters.
like image 1
Wiktor Stribiżew Avatar answered Nov 06 '22 00:11

Wiktor Stribiżew