Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern matching and placeholder values

Tags:

c#

regex

I'm writing an application that uses renaming rules to rename a list of files based on information given by the user. The files may be inconsistently named to begin with, or the filenames may be consistent. The user selects a list of files, and inputs information about the files (for MP3s, they would be Artist, Title, Album, etc). Using a rename rule (example below), the program uses the user-inputted information to rename the files accordingly.

However, if all or some the files are named consistently, I would like to allow the program to 'guess' the file information. That is the problem I'm having. What is the best way to do this?

Sample filenames:

Kraftwerk-Kraftwerk-01-RuckZuck.mp3
Kraftwerk-Autobahn-01-Autobahn.mp3
Kraftwerk-Computer World-03-Numbers.mp3

Rename Rule:

%Artist%-%Album%-%Track%-%Title%.mp3

The program should properly deduce the Artist, Track number, Title, and Album name.

Again, what's the best way to do this? I was thinking regular expressions, but I'm a bit confused.

like image 340
Mike Christiansen Avatar asked Oct 30 '08 21:10

Mike Christiansen


1 Answers

Easiest would be to replace each %Label% with (?<Label>.*?), and escape any other characters.

%Artist%-%Album%-%Track%-%Title%.mp3

becomes

(?<Artist>.*?)-(?<Album>.*?)-(?<Track>.*?)-(?<Title>.*?)\.mp3

You would then get each component into named capture groups.

Dictinary<string,string> match_filename(string rule, string filename) {
    Regex tag_re = new Regex(@'%(\w+)%');
    string pattern = tag_re.Replace(Regex.escape(rule), @'(?<$1>.*?)');
    Regex filename_re = new Regex(pattern);
    Match match = filename_re.Match(filename);

    Dictionary<string,string> tokens =
            new Dictionary<string,string>();
    for (int counter = 1; counter < match.Groups.Count; counter++)
    {
        string group_name = filename_re.GroupNameFromNumber(counter);
        tokens.Add(group_name, m.Groups[counter].Value);
    }
    return tokens;
}

But if the user leaves out the delimiters, or if the delimiters could be contained within the fields, you could get some strange results. The pattern would for %Artist%%Album% would become (?<Artist>.*?)(?<Album>.*?) which is equivalent to .*?.*?. The pattern wouldn't know where to split.

This could be solved if you know the format of certain fields, such as the track-number. If you translate %Track% to (?<Track>\d+) instead, the pattern would know that any digits in the filename must be the Track.

like image 81
Markus Jarderot Avatar answered Sep 23 '22 05:09

Markus Jarderot