Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python partition string with regular expressions

I am trying to clean text strings using Python's partition and regular expressions. For example:

testString = 'Tre Bröders Väg 6 2tr'
sep = '[0-9]tr'
head,sep,tail = testString.partition(sep)
head
>>>'Tre Br\xc3\xb6ders V\xc3\xa4g 6 2tr'

The head still contains the 2tr that I want to remove. I'm not that good with regex, but shouldn't [0-9] do the trick?

The output I would expect from this example would be

head
>>> 'Tre Br\xc3\xb6ders V\xc3\xa4g 6
like image 675
seb Avatar asked Sep 26 '15 10:09

seb


People also ask

How do you split a string with regular expressions in Python?

Regex example to split a string into words In this example, we will split the target string at each white-space character using the \s special sequence. Let's add the + metacharacter at the end of \s . Now, The \s+ regex pattern will split the target string on the occurrence of one or more whitespace characters.

Does Split use regex?

Split(String, Int32, Int32) Splits an input string a specified maximum number of times into an array of substrings, at the positions defined by a regular expression specified in the Regex constructor. The search for the regular expression pattern starts at a specified character position in the input string.

How do you split a string into multiple lines in Python?

You can have a string split across multiple lines by enclosing it in triple quotes. Alternatively, brackets can also be used to spread a string into different lines. Moreover, backslash works as a line continuation character in Python. You can use it to join text on separate lines and create a multiline string.


2 Answers

str.partition does not support regex , hence when you give it a string like - '[0-9]tr' , it is trying to find that exact string in the testString to partition based on, it is not using any regex.

According to documentation of str.partition -

Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.

And since you say, you just want the head , you can use re.split() method from re module , with maxsplit set to 1 , and then take its first element, which should be equivalent to what you were trying with str.partition. Example -

import re
testString = 'Tre Bröders Väg 6 2tr'
sep = '[0-9]tr'
head = re.split(sep,testString,1)[0]

Demo -

>>> import re
>>> testString = 'Tre Bröders Väg 6 2tr'
>>> sep = '[0-9]tr'
>>> head = re.split(sep,testString,1)[0]
>>> head
'Tre Bröders Väg 6 '
like image 97
Anand S Kumar Avatar answered Oct 29 '22 17:10

Anand S Kumar


Plain re.split() method

You can extract the head by using re.split().

import re

testString = 'Tre Bröders Väg 6 2tr'
sep = r'[0-9]tr'  # "r" is essential here!
head, tail = re.split(sep, testString)  
head.strip()
>>>'Tre Bröders Väg 6'

Chocolate sprinkled re.split() method

If you capture sep with (), re.split() behaves like a pseudo re.partition() (There is no such method in Python, actually...)

import re

testString = 'Tre Bröders Väg 6 2tr'
sep = r'([0-9]tr)'  # "()" added.
# maxplit of 1 is added at the suggestion of Ángel ;)
head, sep, tail = re.split(sep, testString, 1)
head, sep, tail
>>>('Tre Bröders Väg 6 ', '2tr', '')
like image 21
yeiichi Avatar answered Oct 29 '22 16:10

yeiichi