I am trying to clean text strings using Python's partition and regular expressions. For example:
testString = 'Tre Bröders Väg 6 2tr'
sep = '[0-9]tr'
head,sep,tail = testString.partition(sep)
head
>>>'Tre Br\xc3\xb6ders V\xc3\xa4g 6 2tr'
The head still contains the 2tr that I want to remove. I'm not that good with regex, but shouldn't [0-9] do the trick?
The output I would expect from this example would be
head
>>> 'Tre Br\xc3\xb6ders V\xc3\xa4g 6
Regex example to split a string into words In this example, we will split the target string at each white-space character using the \s special sequence. Let's add the + metacharacter at the end of \s . Now, The \s+ regex pattern will split the target string on the occurrence of one or more whitespace characters.
Split(String, Int32, Int32) Splits an input string a specified maximum number of times into an array of substrings, at the positions defined by a regular expression specified in the Regex constructor. The search for the regular expression pattern starts at a specified character position in the input string.
You can have a string split across multiple lines by enclosing it in triple quotes. Alternatively, brackets can also be used to spread a string into different lines. Moreover, backslash works as a line continuation character in Python. You can use it to join text on separate lines and create a multiline string.
str.partition
does not support regex , hence when you give it a string like - '[0-9]tr'
, it is trying to find that exact string in the testString
to partition based on, it is not using any regex.
According to documentation of str.partition
-
Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.
And since you say, you just want the head
, you can use re.split()
method from re
module , with maxsplit set to 1
, and then take its first element, which should be equivalent to what you were trying with str.partition
. Example -
import re
testString = 'Tre Bröders Väg 6 2tr'
sep = '[0-9]tr'
head = re.split(sep,testString,1)[0]
Demo -
>>> import re
>>> testString = 'Tre Bröders Väg 6 2tr'
>>> sep = '[0-9]tr'
>>> head = re.split(sep,testString,1)[0]
>>> head
'Tre Bröders Väg 6 '
Plain re.split()
method
You can extract the head
by using re.split()
.
import re
testString = 'Tre Bröders Väg 6 2tr'
sep = r'[0-9]tr' # "r" is essential here!
head, tail = re.split(sep, testString)
head.strip()
>>>'Tre Bröders Väg 6'
Chocolate sprinkled re.split()
method
If you capture sep
with ()
, re.split()
behaves like a pseudo re.partition()
(There is no such method in Python, actually...)
import re
testString = 'Tre Bröders Väg 6 2tr'
sep = r'([0-9]tr)' # "()" added.
# maxplit of 1 is added at the suggestion of Ángel ;)
head, sep, tail = re.split(sep, testString, 1)
head, sep, tail
>>>('Tre Bröders Väg 6 ', '2tr', '')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With