I have the following file names that exhibit this pattern:
000014_L_20111007T084734-20111008T023142.txt 000014_U_20111007T084734-20111008T023142.txt ...
I want to extract the middle two time stamp parts after the second underscore '_'
and before '.txt'
. So I used the following Python regex string split:
time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
But this gives me two extra empty strings in the returned list:
time_info=['', '20111007T084734', '20111008T023142', '']
How do I get only the two time stamp information? i.e. I want:
time_info=['20111007T084734', '20111008T023142']
An alternative solution is to remove all empty strings from the resulting list using filter() such as filter(bool, words) to filter out the empty string '' and other elements that evaluate to False such as None .
The Python "ValueError: empty separator" occurs when we pass an empty string to the str. split() method. To solve the error, use the list() class if you need to get a list of characters, or pass a separator to the str. split() method, e.g. str.
I'm no Python expert but maybe you could just remove the empty strings from your list?
str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f) time_info = filter(None, str_list)
Don't use re.split()
, use the groups()
method of regex Match
/SRE_Match
objects.
>>> f = '000014_L_20111007T084734-20111008T023142.txt' >>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups() >>> time_info ('20111007T084734', '20111008T023142')
You can even name the capturing groups and retrieve them in a dict, though you use groupdict()
rather than groups()
for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.'
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With