I am trying to work out a simple function to capture typos, e.g:
"Westminister15" "Westminister15London" "23Westminister15London"
after fixating:
["Westminister", "15"] ["Westminister", "15", "London"] ["23", "Westminister", "15", "London"]
First attempt:
def fixate(query): digit_pattern = re.compile(r'\D') alpha_pattern = re.compile(r'\d') digits = filter(None, digit_pattern.split(query)) alphas = filter(None, alpha_pattern.split(query)) print digits print alphas
result:
fixate("Westminister15London") > ['15'] > ['Westminister', 'London']
However, I think this could be done more effectively, and I still get bad results when I try something like:
fixate("Westminister15London England") > ['15'] > ['Westminister', 'London England']
Obviously it should enlist London
and England
separately, but I feel my function will get overly patched and theres a simpler approach
This question is somewhat equivalent to this php question
To split a string with multiple delimiters in Python, use the re. split() method. The re. split() function splits the string by each occurrence of the pattern.
Use range () function and slicing notation to split string by a number of characters in Python. Simple example code split a string into array every 2 characters python. s = 'ABCDEFG' n = 2 res = [s [i:i + n] for i in range (0, len (s), n)] print (res)
For splitting string python provides a function called split (). What Is split () function ? split () method breaks a given string by a specified separator and return a list of strings.
Split a number in a string when the string contains only space separated numbers. When the string contains only space separated numbers in string format, we can simply split the string at the spaces using python string split operation.
While processing text data, it may be a situation that we have to extract numbers from the text data. In python, we process text data using strings. So, the task we have to do is to find and split a number in a string. While extracting the numbers, we can classify the string into two types.
The problem is that Python's re.split()
doesn't split on zero-length matches. But you can get the desired result with re.findall()
:
>>> re.findall(r"[^\W\d_]+|\d+", "23Westminister15London") ['23', 'Westminister', '15', 'London'] >>> re.findall(r"[^\W\d_]+|\d+", "Westminister15London England") ['Westminister', '15', 'London', 'England']
\d+
matches any number of digits, [^\W\d_]+
matches any word.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With