Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a string, but ignoring delimit in brackets or braces

Tags:

c#

regex

I have a string like

a,[1,2,3,{4,5},6],b,{c,d,[e,f],g},h

After split by , I expect getting 5 items, the , in the braces or brackets are ignored.

a

[1,2,3,{4,5},6]

b

{c,d,[e,f],g}

h

There are no whitespaces in the string. Is there a regular expression can make it happen?

like image 578
bxx Avatar asked Aug 05 '13 08:08

bxx


3 Answers

You could use this:

var input = "a,[1,2,3,{4,5}],b,{c,d,[e,f]},g";
var result =
    (from Match m in Regex.Matches(input, @"\[[^]]*]|\{[^}]*}|[^,]+")
     select m.Value)
    .ToArray();

This will find any matches like:

  • [ followed by any characters other than ], then terminated by ]
  • { followed by any characters other than }, then terminated by }
  • One or more characters other than ,

This will work, for you sample input, but it cannot handle nested groups like [1,[2,3],4] or {1,{2,3},4}. For that, I'd recommend something a bit more powerful regular expressions. Since you've mentioned in your comments that you're trying to parse Json, I'd recommend you check out the excellent Json.NET library.

like image 158
p.s.w.g Avatar answered Nov 05 '22 02:11

p.s.w.g


Regular expressions * cannot be used to parse nested structures **.

( ∗ True regular expressions without non-regular extensions )

( ∗∗ Nested structures of arbitrary depth and interleaving )

But parsing by hand is not that difficult. First you need to find the , that are not in brackets or braces.

string input = "a,[1,2,3,{4,5},6],b,{c,d,[e,f],g},h";

var delimiterPositions = new List<int>();
int bracesDepth = 0;
int bracketsDepth = 0;

for (int i = 0; i < input.Length; i++)
{
    switch (input[i])
    {
        case '{':
            bracesDepth++;
            break;
        case '}':
            bracesDepth--;
            break;
        case '[':
            bracketsDepth++;
            break;
        case ']':
            bracketsDepth--;
            break;

        default:
            if (bracesDepth == 0 && bracketsDepth == 0 && input[i] == ',')
            {
                delimiterPositions.Add(i);
            }
            break;
    }
}

And then split the string at these positions.

public List<string> SplitAtPositions(string input, List<int> delimiterPositions)
{
    var output = new List<string>();

    for (int i = 0; i < delimiterPositions.Count; i++)
    {
        int index = i == 0 ? 0 : delimiterPositions[i - 1] + 1;
        int length = delimiterPositions[i] - index;
        string s = input.Substring(index, length);
        output.Add(s);
    }

    string lastString = input.Substring(delimiterPositions.Last() + 1);
    output.Add(lastString);

    return output;
}
like image 41
Sebastian Negraszus Avatar answered Nov 05 '22 03:11

Sebastian Negraszus


Even if it looks ugly and there is no regex involved (not sure if it's a requirement or a nice-to-have in the original question), this alternative should work:

class Program
{
    static void Main(string[] args)
    {
        var input = "a,[1,2,3,{4,5}],b,{c,d,[e,f]},g";
        var output = "<root><n>" +
            input.Replace(",", "</n><n>")
            .Replace("[", "<n1><n>")
            .Replace("]", "</n></n1>")
            .Replace("{", "<n2><n>")
            .Replace("}", "</n></n2>") +
            "</n></root>";
        var elements = XDocument
            .Parse(output, LoadOptions.None)
            .Root.Elements()
            .Select(e =>
            {
                if (!e.HasElements)
                    return e.Value;
                else
                {
                    return e.ToString()
                        .Replace(" ", "")
                        .Replace("\r\n", "")
                        .Replace("</n><n>", ",")
                        .Replace("<n1>", "[")
                        .Replace("</n1>", "]")
                        .Replace("<n2>", "{")
                        .Replace("</n2>", "}")
                        .Replace("<n>", "")
                        .Replace("</n>", "")
                        .Replace("\r\n", "")
                        ;
                }
            }).ToList();
    }
}
like image 1
Alex Filipovici Avatar answered Nov 05 '22 02:11

Alex Filipovici