Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx to split camelCase or TitleCase (advanced)

I found a brilliant RegEx to extract the part of a camelCase or TitleCase expression.

 (?<!^)(?=[A-Z]) 

It works as expected:

  • value -> value
  • camelValue -> camel / Value
  • TitleValue -> Title / Value

For example with Java:

String s = "loremIpsum"; words = s.split("(?<!^)(?=[A-Z])"); //words equals words = new String[]{"lorem","Ipsum"} 

My problem is that it does not work in some cases:

  • Case 1: VALUE -> V / A / L / U / E
  • Case 2: eclipseRCPExt -> eclipse / R / C / P / Ext

To my mind, the result shoud be:

  • Case 1: VALUE
  • Case 2: eclipse / RCP / Ext

In other words, given n uppercase chars:

  • if the n chars are followed by lower case chars, the groups should be: (n-1 chars) / (n-th char + lower chars)
  • if the n chars are at the end, the group should be: (n chars).

Any idea on how to improve this regex?

like image 974
Jmini Avatar asked Sep 29 '11 07:09

Jmini


People also ask

How do you split a camel case string?

Another way to convert a camel case string into a capital case sentence is to use the split method to split a string at the start of each word, which is indicated by the capital letter. Then we can use join to join the words with a space character. We call split with the /(?

What is CamelCase conversion?

Camel case (sometimes stylized as camelCase or CamelCase, also known as camel caps or more formally as medial capitals) is the practice of writing phrases without spaces or punctuation. It indicates the separation of words with a single capitalized letter, and the first word starting with either case.

What is CamelCase string?

Camelcase is the practice of writing phrases without any spaces or punctuation between words. To indicate the separation between words, we isntead use a single capitalized letter for each word. Below are some examples: someLabelThatNeedsToBeCamelized. someMixedString.


1 Answers

The following regex works for all of the above examples:

public static void main(String[] args) {     for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {         System.out.println(w);     } }    

It works by forcing the negative lookbehind to not only ignore matches at the start of the string, but to also ignore matches where a capital letter is preceded by another capital letter. This handles cases like "VALUE".

The first part of the regex on its own fails on "eclipseRCPExt" by failing to split between "RPC" and "Ext". This is the purpose of the second clause: (?<!^)(?=[A-Z][a-z]. This clause allows a split before every capital letter that is followed by a lowercase letter, except at the start of the string.

like image 149
NPE Avatar answered Sep 28 '22 04:09

NPE