I have the following file names that exhibit this pattern: <pre class="prettyprint"><code>000014_L_20111007T084734-20111008T023142.txt 000014_U_20111007T084734-20111008T023142.txt ... </code></pre> I want to extract the middle two time stamp parts after the second underscore <code>'_'</code> and before <code>'.txt'</code>. So I used the following Python regex string split: <pre class="prettyprint"><code>time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f) </code></pre> But this gives me two extra empty strings in the returned list: <pre class="prettyprint"><code>time_info=['', '20111007T084734', '20111008T023142', ''] </code></pre> How do I get only the two time stamp information? i.e. I want: <pre class="prettyprint"><code>time_info=['20111007T084734', '20111008T023142'] </code></pre>

I'm no Python expert but maybe you could just remove the empty strings from your list? <pre class="prettyprint"><code>str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f) time_info = filter(None, str_list) </code></pre>

Don't use <code>re.split()</code>, use the <code>groups()</code> method of regex <code>Match</code>/<code>SRE_Match</code> objects. <pre class="prettyprint"><code>>>> f = '000014_L_20111007T084734-20111008T023142.txt' >>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups() >>> time_info ('20111007T084734', '20111008T023142') </code></pre> You can even name the capturing groups and retrieve them in a dict, though you use <code>groupdict()</code> rather than <code>groups()</code> for that. (The regex pattern for such a case would be something like <code>r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.'</code>)

Python regex split without empty string

Tags:

python

regex

I have the following file names that exhibit this pattern:

000014_L_20111007T084734-20111008T023142.txt 000014_U_20111007T084734-20111008T023142.txt ...

I want to extract the middle two time stamp parts after the second underscore '_' and before '.txt'. So I used the following Python regex string split:

time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)

But this gives me two extra empty strings in the returned list:

time_info=['', '20111007T084734', '20111008T023142', '']

How do I get only the two time stamp information? i.e. I want:

time_info=['20111007T084734', '20111008T023142']

697

asked May 30 '13 16:05

tonga

2 Answers

I'm no Python expert but maybe you could just remove the empty strings from your list?

str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f) time_info = filter(None, str_list)

answered Sep 23 '22 10:09

Elliot Bonneville

Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.

>>> f = '000014_L_20111007T084734-20111008T023142.txt' >>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups() >>> time_info ('20111007T084734', '20111008T023142')

You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')

answered Sep 24 '22 10:09

JAB

Related questions
                            
                                Python 2.x - Write binary output to stdout?
                            
                                How can I make a simple 3D line with Matplotlib?
                            
                                How dangerous is setting self.__class__ to something else?
                            
                                Need to execute a function after returning the response in Flask
                            
                                Integer division by negative number [duplicate]
                            
                                Finding red color in image using Python & OpenCV
                            
                                List with many dictionaries VS dictionary with few lists?
                            
                                How to extract multiple slices in an array?
                            
                                Pandas - find first non-null value in column
                            
                                Extending builtin classes in python
                            
                                Django unit testing with date/time-based objects
                            
                                how do I determine whether a python script is imported as module or run as script?
                            
                                Force my scrapy spider to stop crawling
                            
                                Use datetime.strftime() on years before 1900? ("require year >= 1900")
                            
                                Splitting a string by list of indices
                            
                                How to pass on argparse argument to function as kwargs?
                            
                                Adding Macros to Python
                            
                                Python: block network connections for testing purposes?
                            
                                Size of figure when using plt.subplots
                            
                                Add dropout layers between pretrained dense layers in keras

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With