Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a string at uppercase letters

What is the pythonic way to split a string before the occurrences of a given set of characters?

For example, I want to split 'TheLongAndWindingRoad' at any occurrence of an uppercase letter (possibly except the first), and obtain ['The', 'Long', 'And', 'Winding', 'Road'].

Edit: It should also split single occurrences, i.e. from 'ABC' I'd like to obtain ['A', 'B', 'C'].

like image 439
Federico A. Ramponi Avatar asked Feb 17 '10 00:02

Federico A. Ramponi


People also ask

How do you split a string with capital letters?

To split a string on capital letters, call the split() method with the following regular expression - /(? =[A-Z])/ . The regular expression uses a positive lookahead assertion to split the string on each capital letter and return an array of the substrings.

How do you split a string based on an uppercase letter in Python?

findall() method to split a string on uppercase letters, e.g. re. findall('[a-zA-Z][^A-Z]*', my_str) . The re. findall() method will split the string on uppercase letters and will return a list containing the results.

How do you separate lowercase and uppercase in Python?

In Python, lower() is a built-in method used for string handling. The lower() methods returns the lowercased string from the given string. It converts all uppercase characters to lowercase. If no uppercase characters exist, it returns the original string.

How do you separate lowercase and uppercase in Java?

=\\p{Upper}) matches an empty sequence followed by a uppercase letter, and split uses it as a delimiter. See javadoc for more info on Java regexp syntax.


2 Answers

Unfortunately it's not possible to split on a zero-width match in Python. But you can use re.findall instead:

>>> import re >>> re.findall('[A-Z][^A-Z]*', 'TheLongAndWindingRoad') ['The', 'Long', 'And', 'Winding', 'Road'] >>> re.findall('[A-Z][^A-Z]*', 'ABC') ['A', 'B', 'C'] 
like image 55
Mark Byers Avatar answered Sep 22 '22 20:09

Mark Byers


Here is an alternative regex solution. The problem can be reprased as "how do I insert a space before each uppercase letter, before doing the split":

>>> s = "TheLongAndWindingRoad ABC A123B45" >>> re.sub( r"([A-Z])", r" \1", s).split() ['The', 'Long', 'And', 'Winding', 'Road', 'A', 'B', 'C', 'A123', 'B45'] 

This has the advantage of preserving all non-whitespace characters, which most other solutions do not.

like image 27
Dave Kirby Avatar answered Sep 22 '22 20:09

Dave Kirby