Fast way to split alpha and numeric chars in a python string

Tags:

I am trying to work out a simple function to capture typos, e.g:

"Westminister15" "Westminister15London" "23Westminister15London"

after fixating:

Click to copy

["Westminister", "15"] ["Westminister", "15", "London"] ["23", "Westminister", "15", "London"]

First attempt:

Click to copy

 def fixate(query):      digit_pattern = re.compile(r'\D')      alpha_pattern = re.compile(r'\d')      digits = filter(None, digit_pattern.split(query))      alphas = filter(None, alpha_pattern.split(query))      print digits      print alphas

result:

Click to copy

 fixate("Westminister15London")   > ['15']  > ['Westminister', 'London']

However, I think this could be done more effectively, and I still get bad results when I try something like:

Click to copy

 fixate("Westminister15London England")   > ['15']  > ['Westminister', 'London England']

Obviously it should enlist London and England separately, but I feel my function will get overly patched and theres a simpler approach

This question is somewhat equivalent to this php question

779

asked Sep 13 '12 15:09

Hedde van der Heide

1 Answers

The problem is that Python's re.split() doesn't split on zero-length matches. But you can get the desired result with re.findall():

Click to copy

>>> re.findall(r"[^\W\d_]+|\d+", "23Westminister15London") ['23', 'Westminister', '15', 'London'] >>> re.findall(r"[^\W\d_]+|\d+", "Westminister15London England") ['Westminister', '15', 'London', 'England']

\d+ matches any number of digits, [^\W\d_]+ matches any word.

182

answered Oct 20 '22 00:10

Tim Pietzcker

Related questions
                            
                                How can the linux kernel be forced to enumerate the PCI-e bus?
                            
                                What is the maximum delay for setInterval?
                            
                                Spring MVC + Hibernate: data validation strategies
                            
                                How to include templates dynamically in Django using "include" tag
                            
                                Does appending to a list in R result in copying?
                            
                                Linking error for inline functions
                            
                                "gpgkeys: key 7F0CEB10 not found on keyserver" Response while try to install mongodb on Ubuntu
                            
                                Get integer part of number
                            
                                Can GSON deserialize in a case-insensitive way
                            
                                The as operator on structures?
                            
                                Javac "cannot find symbol"
                            
                                Eclipse Rename - Refactor hotkey inserts registered trademark symbol

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast way to split alpha and numeric chars in a python string

Tags:

Hedde van der Heide

People also ask

1 Answers

Tim Pietzcker

Recent Activity

Donate For Us