Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsing of a string containing an array

Tags:

string

c#

list

I'd like to convert string containing recursive array of strings to an array of depth one.

Example:

StringToArray("[a, b, [c, [d, e]], f, [g, h], i]") == ["a", "b", "[c, [d, e]]", "f", "[g, h]", "i"]

Seems quite simple. But, I come from functional background and I'm not that familiar with .NET Framework standard libraries, so every time (I started from scratch like 3 times) I end up just plain ugly code. My latest implementation is here. As you see, it's ugly as hell.

So, what's the C# way to do this?

like image 980
dijxtra Avatar asked Nov 01 '11 01:11

dijxtra


People also ask

What is parsing of string?

A parsing operation converts a string that represents a . NET base type into that base type. For example, a parsing operation is used to convert a string to a floating-point number or to a date-and-time value. The method most commonly used to perform a parsing operation is the Parse method.

Which direct method can be used to parse a string?

Use eval method to convert a string to command.

What is parsing a string in C?

The C function strtok() is a string tokenization function that takes two arguments: an initial string to be parsed and a const -qualified character delimiter. It returns a pointer to the first character of a token or to a null pointer if there is no token.


2 Answers

@ojlovecd has a good answer, using Regular Expressions.
However, his answer is overly complicated, so here's my similar, simpler answer.

public string[] StringToArray(string input) {
    var pattern = new Regex(@"
        \[
            (?:
            \s*
                (?<results>(?:
                (?(open)  [^\[\]]+  |  [^\[\],]+  )
                |(?<open>\[)
                |(?<-open>\])
                )+)
                (?(open)(?!))
            ,?
            )*
        \]
    ", RegexOptions.IgnorePatternWhitespace);

    // Find the first match:
    var result = pattern.Match(input);
    if (result.Success) {
        // Extract the captured values:
        var captures = result.Groups["results"].Captures.Cast<Capture>().Select(c => c.Value).ToArray();
        return captures;
    }
    // Not a match
    return null;
}

Using this code, you will see that StringToArray("[a, b, [c, [d, e]], f, [g, h], i]") will return the following array: ["a", "b", "[c, [d, e]]", "f", "[g, h]", "i"].

For more information on the balanced groups that I used for matching balanced braces, take a look at Microsoft's documentation.

Update:
As per the comments, if you want to also balance quotes, here's a possible modification. (Note that in C# the " is escaped as "") I also added descriptions of the pattern to help clarify it:

    var pattern = new Regex(@"
        \[
            (?:
            \s*
                (?<results>(?:              # Capture everything into 'results'
                    (?(open)                # If 'open' Then
                        [^\[\]]+            #   Capture everything but brackets
                        |                   # Else (not open):
                        (?:                 #   Capture either:
                            [^\[\],'""]+    #       Unimportant characters
                            |               #   Or
                            ['""][^'""]*?['""] #    Anything between quotes
                        )  
                    )                       # End If
                    |(?<open>\[)            # Open bracket
                    |(?<-open>\])           # Close bracket
                )+)
                (?(open)(?!))               # Fail while there's an unbalanced 'open'
            ,?
            )*
        \]
    ", RegexOptions.IgnorePatternWhitespace);
like image 135
Scott Rippey Avatar answered Oct 19 '22 07:10

Scott Rippey


with Regex, it can solve your problem:

static string[] StringToArray(string str)
{
    Regex reg = new Regex(@"^\[(.*)\]$");
    Match match = reg.Match(str);
    if (!match.Success)
        return null;
    str = match.Groups[1].Value;
    List<string> list = new List<string>();
    reg = new Regex(@"\[[^\[\]]*(((?'Open'\[)[^\[\]]*)+((?'-Open'\])[^\[\]]*)+)*(?(Open)(?!))\]");
    Dictionary<string, string> dic = new Dictionary<string, string>();
    int index = 0;
    str = reg.Replace(str, m =>
    {
        string temp = "ojlovecd" + (index++).ToString();
        dic.Add(temp, m.Value);
        return temp;
    });
    string[] result = str.Split(',');
    for (int i = 0; i < result.Length; i++)
    {
        string s = result[i].Trim();
        if (dic.ContainsKey(s))
            result[i] = dic[s].Trim();
        else
            result[i] = s;
    }
    return result;
}
like image 37
ojlovecd Avatar answered Oct 19 '22 08:10

ojlovecd