Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I write a regex to match a torrents title format?

Tags:

c#

regex

I'm trying to match and break up a typical tv torrent's title:

MyTV.Show.S09E01.HDTV.XviD
MyTV.Show.S10E02.HDTV.XviD
MyTV.Show.901.HDTV.XviD
MyTV.Show.1102.HDTV.XviD

I'm trying to break these strings up into 3 capture groups for each entry: Title, Season, Episode.

I can handle the first 2 easy enough:

^([a-zA-Z0-9.]*)\.S([0-9]{1,2})E([0-9]{1,2}).*$

However, the third and fourth one prove difficult to break apart the season and episode. If I could work backwards it would be easier. For example, with "901", If I could work backwards it would be take the first to digits as the episode number, anything remaining before that is the season number.

Does anyone have any tips for how I can break these strings up into those relevant capture groups?

like image 363
KingNestor Avatar asked Sep 27 '10 23:09

KingNestor


2 Answers

Here's what I would use:

(.*?)\.S?(\d{1,2})E?(\d{2})\.(.*)

Has capture groups:

1: Name
2: Season
3: Episode
4: The Rest

Here's some code in C# (courtesy of this post): see it live

using System;
using System.Text.RegularExpressions;

public class Test
{

    public static void Main()
    {
        string s = @"MyTV.Show.S09E01.HDTV.XviD
            MyTV.Show.S10E02.HDTV.XviD
            MyTV.Show.901.HDTV.XviD
            MyTV.Show.1102.HDTV.XviD";

        Extract(s);

    }

    private static readonly Regex rx = new Regex
        (@"(.*?)\.S?(\d{1,2})E?(\d{2})\.(.*)", RegexOptions.IgnoreCase);

    static void Extract(string text)
    {
        MatchCollection matches = rx.Matches(text);

        foreach (Match match in matches)
        {
            Console.WriteLine("Name: {0}, Season: {1}, Ep: {2}, Stuff: {3}\n",
                match.Groups[1].ToString().Trim(), match.Groups[2], 
                match.Groups[3], match.Groups[4].ToString().Trim());
        }
    }

}

Produces:

Name: MyTV.Show, Season: 09, Ep: 01, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 10, Ep: 02, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 9, Ep: 01, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 11, Ep: 02, Stuff: HDTV.XviD
like image 149
NullUserException Avatar answered Sep 25 '22 01:09

NullUserException


Almost every media file I've ever seen that has come from a torrent had two-digit episodes. With that, you should be able to do E([0-9]{2}). instead and get the expression to match.

I'd estimate 99.9% of shows are marked with two digit episodes. If you're trying to write a script to easily label your own shows, I'd go with the two digit episode assumption and manually rename mistagged files you come across. If you're trying to write something for public consumption, you probably have a lot more syntaxes that you'll need to consider. I've seen this tried by other applications in the past, and all have worked just so-so. It's a hard problem that probably has no single solution.

like image 31
Dave McClelland Avatar answered Sep 23 '22 01:09

Dave McClelland