What regex can I use to match ".#,#." within a string. It may or may not exist in the string. Some examples with expected outputs might be:
Test1.0,0.csv -> ('Test1', '0,0', 'csv') (Basic Example)
Test2.wma -> ('Test2', 'wma') (No Match)
Test3.1100,456.jpg -> ('Test3', '1100,456', 'jpg') (Basic with Large Number)
T.E.S.T.4.5,6.png -> ('T.E.S.T.4', '5,6', 'png') (Doesn't strip all periods)
Test5,7,8.sss -> ('Test5,7,8', 'sss') (No Match)
Test6.2,3,4.png -> ('Test6.2,3,4', 'png') (No Match, to many commas)
Test7.5,6.7,8.test -> ('Test7', '5,6', '7,8', 'test') (Double Match?)
The last one isn't too important and I would only expect that .#,#. would appear once. Most files I'm processing, I would expect to fall into the first through fourth examples, so I'm most interested in those.
Thanks for the help!
match() re. match() function of re in Python will search the regular expression pattern and return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
groups() method. This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.
Python regex re.search() method looks for occurrences of the regex pattern inside the entire target string and returns the corresponding Match Object instance where the match found. The re.search() returns only the first match to the pattern from the target string.
You can use the regex \.\d+,\d+\.
to find all matches for that pattern, but you will need to do a little extra to get the output you expect, especially since you want to treat .5,6.7,8.
as two matches.
Here is one potential solution:
def transform(s):
s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s)
return tuple(s.split('\n'))
Examples:
>>> transform('Test1.0,0.csv')
('Test1', '0,0', 'csv')
>>> transform('Test2.wma')
('Test2.wma',)
>>> transform('Test3.1100,456.jpg')
('Test3', '1100,456', 'jpg')
>>> transform('T.E.S.T.4.5,6.png')
('T.E.S.T.4', '5,6', 'png')
>>> transform('Test5,7,8.sss')
('Test5,7,8.sss',)
>>> transform('Test6.2,3,4.png')
('Test6.2,3,4.png',)
>>> transform('Test7.5,6.7,8.test')
('Test7', '5,6', '7,8', 'test')
To also get the file extension split off when there are no matches, you can use the following:
def transform(s):
s = re.sub(r'(\.\d+,\d+)+\.', lambda m: m.group(0).replace('.', '\n'), s)
groups = s.split('\n')
groups[-1:] = groups[-1].rsplit('.', 1)
return tuple(groups)
This will be the same output as above except that 'Test2.wma'
becomes ('Test2', 'wma')
, with similar behavior for 'Test5,7,8.sss'
and 'Test5,7,8.sss'
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With