What regex can I use to match ".#,#." within a string. It may or may not exist in the string. Some examples with expected outputs might be: <pre class="prettyprint"><code>Test1.0,0.csv -> ('Test1', '0,0', 'csv') (Basic Example) Test2.wma -> ('Test2', 'wma') (No Match) Test3.1100,456.jpg -> ('Test3', '1100,456', 'jpg') (Basic with Large Number) T.E.S.T.4.5,6.png -> ('T.E.S.T.4', '5,6', 'png') (Doesn't strip all periods) Test5,7,8.sss -> ('Test5,7,8', 'sss') (No Match) Test6.2,3,4.png -> ('Test6.2,3,4', 'png') (No Match, to many commas) Test7.5,6.7,8.test -> ('Test7', '5,6', '7,8', 'test') (Double Match?) </code></pre> The last one isn't too important and I would only expect that .#,#. would appear once. Most files I'm processing, I would expect to fall into the first through fourth examples, so I'm most interested in those. Thanks for the help!

You can use the regex <code>\.\d+,\d+\.</code> to find all matches for that pattern, but you will need to do a little extra to get the output you expect, especially since you want to treat <code>.5,6.7,8.</code> as two matches. Here is one potential solution: <pre class="prettyprint"><code>def transform(s): s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s) return tuple(s.split('\n')) </code></pre> Examples: <pre class="prettyprint"><code>>>> transform('Test1.0,0.csv') ('Test1', '0,0', 'csv') >>> transform('Test2.wma') ('Test2.wma',) >>> transform('Test3.1100,456.jpg') ('Test3', '1100,456', 'jpg') >>> transform('T.E.S.T.4.5,6.png') ('T.E.S.T.4', '5,6', 'png') >>> transform('Test5,7,8.sss') ('Test5,7,8.sss',) >>> transform('Test6.2,3,4.png') ('Test6.2,3,4.png',) >>> transform('Test7.5,6.7,8.test') ('Test7', '5,6', '7,8', 'test') </code></pre> To also get the file extension split off when there are no matches, you can use the following: <pre class="prettyprint"><code>def transform(s): s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s) groups = s.split('\n') groups[-1:] = groups[-1].rsplit('.', 1) return tuple(groups) </code></pre> This will be the same output as above except that <code>'Test2.wma'</code> becomes <code>('Test2', 'wma')</code>, with similar behavior for <code>'Test5,7,8.sss'</code> and <code>'Test5,7,8.sss'</code>.

Python/Regex - Match .#,#. in String

Tags:

python

regex

What regex can I use to match ".#,#." within a string. It may or may not exist in the string. Some examples with expected outputs might be:

Click to copy

Test1.0,0.csv      -> ('Test1', '0,0', 'csv')         (Basic Example)
Test2.wma          -> ('Test2', 'wma')                (No Match)
Test3.1100,456.jpg -> ('Test3', '1100,456', 'jpg')    (Basic with Large Number)
T.E.S.T.4.5,6.png  -> ('T.E.S.T.4', '5,6', 'png')     (Doesn't strip all periods)
Test5,7,8.sss      -> ('Test5,7,8', 'sss')            (No Match)
Test6.2,3,4.png    -> ('Test6.2,3,4', 'png')          (No Match, to many commas)
Test7.5,6.7,8.test -> ('Test7', '5,6', '7,8', 'test') (Double Match?)

The last one isn't too important and I would only expect that .#,#. would appear once. Most files I'm processing, I would expect to fall into the first through fourth examples, so I'm most interested in those.

Thanks for the help!

395

asked Sep 26 '12 18:09

Scott B

1 Answers

You can use the regex \.\d+,\d+\. to find all matches for that pattern, but you will need to do a little extra to get the output you expect, especially since you want to treat .5,6.7,8. as two matches.

Here is one potential solution:

Click to copy

def transform(s):
    s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s)
    return tuple(s.split('\n'))

Examples:

Click to copy

>>> transform('Test1.0,0.csv')
('Test1', '0,0', 'csv')
>>> transform('Test2.wma')
('Test2.wma',)
>>> transform('Test3.1100,456.jpg')
('Test3', '1100,456', 'jpg')
>>> transform('T.E.S.T.4.5,6.png')
('T.E.S.T.4', '5,6', 'png')
>>> transform('Test5,7,8.sss')
('Test5,7,8.sss',)
>>> transform('Test6.2,3,4.png')
('Test6.2,3,4.png',)
>>> transform('Test7.5,6.7,8.test')
('Test7', '5,6', '7,8', 'test')

To also get the file extension split off when there are no matches, you can use the following:

Click to copy

def transform(s):
    s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s)
    groups = s.split('\n')
    groups[-1:] = groups[-1].rsplit('.', 1)
    return tuple(groups)

This will be the same output as above except that 'Test2.wma' becomes ('Test2', 'wma'), with similar behavior for 'Test5,7,8.sss' and 'Test5,7,8.sss'.

answered Oct 14 '22 08:10

Andrew Clark

Related questions
                            
                                Calculating lunar/lunisolar holidays in Python
                            
                                "find . -regex ..." in Python or How to find files whose whole name (path + name) matches a regular expression?
                            
                                A scrollbar event when scrolling?
                            
                                Using PIL to fill empty image space with nearby colors (aka inpainting)
                            
                                How do I serialize a Java object such that it can be deserialized by pickle (Python)?
                            
                                Django debug toolbar setup
                            
                                How to organize and run unittests and functional tests separately using nosetests
                            
                                Blank line rule at interactive prompt
                            
                                Python pdb (debugger) disp equivalent?
                            
                                Export GMail Contacts via Unattended Script
                            
                                segmented linear regression in python
                            
                                How Can I Downgrade from Python 3.2 to 2.7?
                            
                                Django: Display values of the selected multiple choice field in a template
                            
                                pandas reading csv orientation
                            
                                Image resize using PIL changes colors drastically
                            
                                PGP-signing multipart e-mails with Python
                            
                                How to change the default version of python in a linux machine ?(not just symlink) [closed]
                            
                                Using git to Track changes to dropbox?
                            
                                matplotlib: faster PDF generation?
                            
                                using python urllib2 to send POST request and get response

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python/Regex - Match .#,#. in String

Tags:

python

regex

Scott B

People also ask

1 Answers

Andrew Clark

Recent Activity

Donate For Us