CamelCase conversion to friendly name, i.e. Enum constants; Problems?

Question

In my answer to this question, I mentioned that we used UpperCamelCase parsing to get a description of an enum constant not decorated with a Description attribute, but it was naive, and it didn't work in all cases. I revisited it, and this is what I came up with:

var result = Regex.Replace(camelCasedString, 
                            @"(?<a>(?<!^)[A-Z][a-z])", @" ${a}");
result = Regex.Replace(result,
                            @"(?<a>[a-z])(?<b>[A-Z0-9])", @"${a} ${b}");

The first Replace looks for an uppercase letter, followed by a lowercase letter, EXCEPT where the uppercase letter is the start of the string (to avoid having to go back and trim), and adds a preceding space. It handles your basic UpperCamelCase identifiers, and leading all-upper acronyms like FDICInsured.

The second Replace looks for a lowercase letter followed by an uppercase letter or a number, and inserts a space between the two. This is to handle special but common cases of middle or trailing acronyms, or numbers in an identifier (except leading numbers, which are usually prohibited in C-style languages anyway).

Running some basic unit tests, the combination of these two correctly separated all of the following identifiers: NoDescription, HasLotsOfWords, AAANoDescription, ThisHasTheAcronymABCInTheMiddle, MyTrailingAcronymID, TheNumber3, IDo3Things, IAmAValueWithSingleLetterWords, and Basic (which didn't have any spaces added).

So, I'm posting this first to share it with others who may find it useful, and second to ask two questions:

Anyone see a case that would follow common CamelCase-ish conventions, that WOULDN'T be correctly separated into a friendly string this way? I know it won't separate adjacent acronyms (FDICFCUAInsured), recapitalize "properly" camelCased acronyms like FdicInsured, or capitalize the first letter of a lowerCamelCased identifier (but that one's easy to add - result = Regex.Replace(result, "^[a-z]", m=>m.ToString().ToUpper());). Anything else?
Can anyone see a way to make this one statement, or more elegant? I was looking to combine the Replace calls, but as they do two different things to their matches it can't be done with these two strings. They could be combined into a method chain with a RegexReplace extension method on String, but can anyone think of better?

Philip Rieck · Accepted Answer

So while I agree with Hans Passant here, I have to say that I had to try my hand at making it one regex as an armchair regex user.

(?<a>(?<!^)((?:[A-Z][a-z])|(?:(?<!^[A-Z]+)[A-Z0-9]+(?:(?=[A-Z][a-z])|$))|(?:[0-9]+)))

Is what I came up with. It seems to pass all the tests you put forward in the question.

So

var result = Regex.Replace(camelCasedString, @"(?<a>(?<!^)((?:[A-Z][a-z])|(?:(?<!^[A-Z]+)[A-Z0-9]+(?:(?=[A-Z][a-z])|$))|(?:[0-9]+)))", @" ${a}");

Does it in one pass.

atk · Answer

not that this directly answers the question, but why not test by taking the standard C# API and converting each class into a friendly name? It'd take some manual verification, but it'd give you a good list of standard names to test.

CamelCase conversion to friendly name, i.e. Enum constants; Problems?

Tags:

string

c#

regex

KeithS

2 Answers

Philip Rieck

atk

Recent Activity

Donate For Us

CamelCase conversion to friendly name, i.e. Enum constants; Problems?

Tags:

string

c#

regex

KeithS

2 Answers

Philip Rieck

atk

Related questions

Recent Activity

Donate For Us