Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterative regex capturing in C#

I have to read in a file that contains a number of coordinates. The file is structured in the following way:

X1/Y1,X2/Y2,X3/Y3,X4/Y4

Where X and Y are positive integers. To solve this problem I want to use a regex (I think this is in general a good idea because of minimal refactoring when the pattern changes).

Therefore I have developed the following regex:

Regex r = new Regex(@^(?<Coor>(?<X>[0-9]+)/(?<Y>[0-9]+))(,(?<Coor>(?<X>[0-9]+)/(?<Y>[0-9]+)))*$");

However when I test this regex on data, for example:

1302/1425,1917/2010

The Regex only seems to recall the last X, Y and Coor group. In this case Coor is "12/17", X is "1917" and Y is "2010". Is there a way to generate some sort of tree. So I find an object who gives me all the Coor expressions, with under each Coor an X and Y component?

If possible, I would like to use only one Regex, this because the format could perhaps change to another one.

like image 454
Willem Van Onsem Avatar asked Dec 21 '22 22:12

Willem Van Onsem


1 Answers

You can quite easily solve this without any regular expression by using string.Split and int.Parse:

var coords = s.Split(',')
    .Select(x => x.Split('/'))
    .Select(a => new {
        X = int.Parse(a[0]),
        Y = int.Parse(a[1])
    });

If you want to use a regular expression to validate the string you could do it like this:

"^(?!,)(?:(?:^|,)[0-9]+/[0-9]+)*$"

If you want to use a regular expression based approach also for extracting the data you could first validate the string using the above regular expression and then extra the data as follows:

var coords = Regex.Matches(s, "([0-9]+)/([0-9]+)")
    .Cast<Match>()
    .Select(match => new
    {
        X = int.Parse(match.Groups[1].Value),
        Y = int.Parse(match.Groups[2].Value)
    });

If you really want to perform the validation and data extraction simultaneously with a single regular expression you can use two capturing groups and find the results in the Captures property for each group. Here's one way you could perform both the validation and data extraction using a single regular expression:

List<Group> groups =
    Regex.Matches(s, "^(?!,)(?:(?:^|,)([0-9]+)/([0-9]+))*$")
         .Cast<Match>().First()
         .Groups.Cast<Group>().Skip(1)
         .ToList();

var coords = Enumerable.Range(0, groups[0].Captures.Count)
    .Select(i => new
    {
        X = int.Parse(groups[0].Captures[i]),
        Y = int.Parse(groups[1].Captures[i])
    });

However you may want to consider whether the complexity of this solution is worth it compared to the string.Split based solution.

like image 79
Mark Byers Avatar answered Dec 24 '22 12:12

Mark Byers