I'm using the regex
System.Text.RegularExpressions.Regex.Replace(stringToSplit, "([A-Z])", " $1").Trim()
to split strings by capital letter, for example:
'MyNameIsSimon' becomes 'My Name Is Simon'
I find this incredibly useful when working with enumerations. What I would like to do is change it slightly so that strings are only split if the next letter is a lowercase letter, for example:
'USAToday' would become 'USA Today'
Can this be done?
EDIT: Thanks to all for responding. I may not have entirely thought this through, in some cases 'A' and 'I' would need to be ignored but this is not possible (at least not in a meaningful way). In my case though the answers below do what I need. Thanks!
To split a string on capital letters, call the split() method with the following regular expression - /(? =[A-Z])/ . The regular expression uses a positive lookahead assertion to split the string on each capital letter and returns an array of the substrings.
Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.
split(String regex) method splits this string around matches of the given regular expression. This method works in the same way as invoking the method i.e split(String regex, int limit) with the given expression and a limit argument of zero. Therefore, trailing empty strings are not included in the resulting array.
There's nothing "wrong" with it if it's supposed to split at the capital letters. The "1" and "@" aren't capital letters. It sounds like the problem would be more accurately stated as "I need to split at any character that is not followed by a lower case letter."
Regex to split String into words with multiple word boundary delimiters. In this example, we will use the[\b\W\b]+ regex pattern to cater to any Non-alphanumeric delimiters. Using this pattern we can split string by multiple word boundary delimiters that will result in a list of alphanumeric/word tokens.
The "1" and "@" aren't capital letters. It sounds like the problem would be more accurately stated as "I need to split at any character that is not followed by a lower case letter." Edit: That's going to have the unintended consequence of adding a space at the end of the string.
If maxsplit is 2, at most two splits occur, and the remainder of the string is returned as the final element of the list. flags: By default, no flags are applied. There are many regex flags we can use. For example, the re.I is used for performing case-insensitive searching.
((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))
or its Unicode-aware cousin
((?<=\p{Ll})\p{Lu}|\p{Lu}(?=\p{Ll}))
when replaced globally with
" $1"
handles
TodayILiveInTheUSAWithSimon USAToday IAmSOOOBored
yielding
Today I Live In The USA With Simon USA Today I Am SOOO Bored
In a second step you'd have to trim the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With