Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String splitting in Python using regex

Tags:

python

regex

I'm trying to split a string in Python so that I get everything before a certain regex.

example string: "Some.File.Num10.example.txt"

I need everything before this part: "Num10", regex: r'Num\d\d' (the number will vary and possibly what comes after).

Any ideas on how to do this?

like image 487
henkimon Avatar asked May 10 '12 23:05

henkimon


People also ask

Can I use regex with split Python?

Regex to Split string with multiple delimiters With the regex split() method, you will get more flexibility. You can specify a pattern for the delimiters where you can specify multiple delimiters, while with the string's split() method, you could have used only a fixed character or set of characters to split a string.

Can we use regex in split a string?

split(String regex) method splits this string around matches of the given regular expression. This method works in the same way as invoking the method i.e split(String regex, int limit) with the given expression and a limit argument of zero. Therefore, trailing empty strings are not included in the resulting array.

How do you split a string using delimiter in Python?

Python String | split() separator : This is a delimiter. The string splits at this specified separator. If is not provided then any white space is a separator. maxsplit : It is a number, which tells us to split the string into maximum of provided number of times.

Is regex faster than split Python?

Split is most of the time faster than a regex , but it depends on the complexity of the regex.


3 Answers

>>> import re
>>> s = "Some.File.Num10.example.txt"
>>> p = re.compile("Num\d{2}")
>>> match = p.search(s)
>>> s[:match.start()]
'Some.File.'

This would be more efficient that doing a split because search doesn't have to scan the whole string. It breaks on the first match. In your example it wouldn't make a different as the strings are short but in case your string is very long and you know that the match is going to be in the beginning, then this approach would be faster.

I just wrote a small program to profile search() and split() and confirmed the above assertion.

like image 85
varunl Avatar answered Oct 09 '22 15:10

varunl


>>> import re
>>> text = "Some.File.Num10.example.txt"
>>> re.split(r'Num\d{2}',text)[0]
'Some.File.'
like image 38
jamylak Avatar answered Oct 09 '22 14:10

jamylak


You can use Python's re.split()

import re

my_str = "This is a string."

re.split("\W+", my_str)

['This', 'is', 'a', 'string', '']
like image 45
Ignacio Vazquez-Abrams Avatar answered Oct 09 '22 13:10

Ignacio Vazquez-Abrams