Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: convert camel case to space delimited using RegEx and taking Acronyms into account

Tags:

python

regex

I am trying to convert camel case to space separated values using python. For example:

divLineColor -> div Line Color

This line does that successfully:

label = re.sub("([A-Z])"," \g<0>",label) 

The problem I am having is with things like simpleBigURL they should do this:

simpleBigURL -> simple Big URL

I am not entirely sure how to get this result. Help!


This is one thing that I tried:

label = re.sub("([a-z])([A-Z])","\g<0> \g<1>",label) 

But this produces weird results like:

divLineColor -> divL vineC eolor

I was also thinking that using the (?!...) could work but I have not had any luck.

like image 350
sixtyfootersdude Avatar asked Feb 16 '11 19:02

sixtyfootersdude


1 Answers

This should work with 'divLineColor', 'simpleBigURL', 'OldHTMLFile' and 'SQLServer'.

label = re.sub(r'((?<=[a-z])[A-Z]|(?<!\A)[A-Z](?=[a-z]))', r' \1', label) 

Explanation:

label = re.sub(r"""         (            # start the group             # alternative 1         (?<=[a-z])  # current position is preceded by a lower char                     # (positive lookbehind: does not consume any char)         [A-Z]       # an upper char                     #         |   # or             # alternative 2         (?<!\A)     # current position is not at the beginning of the string                     # (negative lookbehind: does not consume any char)         [A-Z]       # an upper char         (?=[a-z])   # matches if next char is a lower char                     # lookahead assertion: does not consume any char         )           # end the group""",     r' \1', label, flags=re.VERBOSE) 

If a match is found it is replaced with ' \1', which is a string consisting of a leading blank and the match itself.

Alternative 1 for a match is an upper character, but only if it is preceded by a lower character. We want to translate abYZ to ab YZ and not to ab Y Z.

Alternative 2 for a match is an upper character, but only if it is followed by a lower character and not at the start of the string. We want to translate ABCyz to AB Cyz and not to A B Cyz.

like image 61
Matthias Avatar answered Oct 03 '22 10:10

Matthias