I'm trying to match (in Python) the show name and season/episode numbers from tv episode filenames in the format:
Show.One.S01E05.720p.HDTV.x264-CTU.mkv
and
Show.Two.S08E02.HDTV.XviD-LOL.avi
My regular expression:
(?P<show>[\w\s.,_-]+)\.[Ss]?(?P<season>[\d]{1,2})[XxEe]?(?P<episode>[\d]{2})
matches correctly on Show Two giving me Show Two
, 08
and 02
. However the 720 in Show One means I get back 7
and 20
for season/episode.
If I remove the ?
after [XxEe]
then it matches both types but I want that range to be optional for filenames where the episode identifier isn't included.
I've tried using ??
to stop the [XxEe]
match being greedy as listed in the python docs re module section but this has no effect.
How can I capture the series name section and the season/episode section while ignoring the rest of the string?
Change the greedity on first match:
p=re.compile('(?P<show>[\w\s.,_-]+?)\.[Ss]?(?P<season>[\d]{1,2})[XxEe]?(?P<episode>[\d]{2})')
print p.findall("Game.of.Thrones.S01E05.720p.HDTV.x264-CTU.mkv")
[('Game.of.Thrones', '01', '05')]
print p.findall("Entourage.S08E02.HDTV.XviD-LOL.avi")
[('Entourage', '08', '02')]
Note the ?
following +
in first group.
Explanation :
First match eats too much, so reducing its greedity makes the following match sooner. (not a really nice example by the way, I would have changed names as they definitely sound a bit too Warezzz-y to be honest ;-) )
Try:
v
(?P<show>[\w\s.,_-]+?)\.[Ss]?(?P<season>[\d]{1,2})[XxEe]?(?P<episode>[\d]{2})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With