Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you improve this C# regular expression code?

Tags:

c#

regex

In a program I'm reading in some data files, part of which are formatted as a series of records each in square brackets. Each record contains a section title and a series of key/value pairs.

I originally wrote code to loop through and extract the values, but decided it could be done more elegantly using regular expressions. Below is my resulting code (I just hacked it out for now in a console app - so know the variable names aren't that great, etc.

Can you suggest improvements? I feel it shouldn't be necessary to do two matches and a substring, but can't figure out how to do it all in one big step:

string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";

MatchCollection matches=Regex.Matches(input, @"\[[^\]]*\]");
foreach (Match match in matches)
{
    string subinput = match.Value;

    int firstSpace = subinput.IndexOf(' ');
    string section = subinput.Substring(1, firstSpace-1);
    Console.WriteLine(section);

    MatchCollection newMatches = Regex.Matches(subinput.Substring(firstSpace + 1), @"\s*(\w+)\s*=\s*(\w+)\s*");
    foreach (Match newMatch in newMatches)
    {
        Console.WriteLine("{0}={1}", newMatch.Groups[1].Value, newMatch.Groups[2].Value);
    }
}
like image 357
Saqib Avatar asked Jun 22 '09 22:06

Saqib


People also ask

Is C very important?

Being a middle-level language, C reduces the gap between the low-level and high-level languages. It can be used for writing operating systems as well as doing application level programming. Helps to understand the fundamentals of Computer Theories.

Why is C so low level?

C and C++ are now considered low-level languages because they have no automatic memory management. Olivier: The definition of low level has changed quite a bit since the inception of computer science. I would not qualify C as a low or high level language, but rather more like an intermediary language.

What C can be used for?

C is a powerful general-purpose programming language. It can be used to develop software like operating systems, databases, compilers, and so on.


2 Answers

I prefer named captures, nice formatting, and clarity:

string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches = Regex.Matches(input, @"\[
                                                    (?<sectionName>\S+)
                                                      (\s+                                                            
                                                         (?<key>[^=]+)
                                                          =
                                                         (?<value>[^ \] ]+)                                                    
                                                      )+
                                                  ]", RegexOptions.IgnorePatternWhitespace);

foreach(Match currentMatch in matches)
{
    Console.WriteLine("Section: {0}", currentMatch.Groups["sectionName"].Value);
    CaptureCollection keys = currentMatch.Groups["key"].Captures;
    CaptureCollection values = currentMatch.Groups["value"].Captures;

    for(int i = 0; i < keys.Count; i++)
    {
        Console.WriteLine("{0}={1}", keys[i].Value, values[i].Value);           
    }
}
like image 80
Jeff Moser Avatar answered Oct 02 '22 18:10

Jeff Moser


You should take advantage of the collections to get each key. So something like this then:

        string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";

        Regex r = new Regex(@"(\[(\S+) (\s*\w+\s*=\s*\w+\s*)*\])", RegexOptions.Compiled);

        foreach (Match m in r.Matches(input))
        {
            Console.WriteLine(m.Groups[2].Value);
            foreach (Capture c in m.Groups[3].Captures)
            {
                Console.WriteLine(c.Value);
            }
        }

Resulting output:

section1
key1=value1
key2=value2
section2
key1=value1
key2=value2
key3=value3
section3
key1=value1
like image 21
patjbs Avatar answered Oct 02 '22 18:10

patjbs