Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex split without empty string

Tags:

python

regex

I have the following file names that exhibit this pattern:

000014_L_20111007T084734-20111008T023142.txt 000014_U_20111007T084734-20111008T023142.txt ... 

I want to extract the middle two time stamp parts after the second underscore '_' and before '.txt'. So I used the following Python regex string split:

time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f) 

But this gives me two extra empty strings in the returned list:

time_info=['', '20111007T084734', '20111008T023142', ''] 

How do I get only the two time stamp information? i.e. I want:

time_info=['20111007T084734', '20111008T023142'] 
like image 697
tonga Avatar asked May 30 '13 16:05

tonga


People also ask

How to remove empty string in regex Python?

An alternative solution is to remove all empty strings from the resulting list using filter() such as filter(bool, words) to filter out the empty string '' and other elements that evaluate to False such as None .

How do you use an empty separator in Python?

The Python "ValueError: empty separator" occurs when we pass an empty string to the str. split() method. To solve the error, use the list() class if you need to get a list of characters, or pass a separator to the str. split() method, e.g. str.


2 Answers

I'm no Python expert but maybe you could just remove the empty strings from your list?

str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f) time_info = filter(None, str_list) 
like image 97
Elliot Bonneville Avatar answered Sep 23 '22 10:09

Elliot Bonneville


Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.

>>> f = '000014_L_20111007T084734-20111008T023142.txt' >>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups() >>> time_info ('20111007T084734', '20111008T023142') 

You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')

like image 29
JAB Avatar answered Sep 24 '22 10:09

JAB