Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive searching a pattern in a string

Tags:

string

c#

I am using c#. I have following string

<li> 
    <a href="abc">P1</a> 
    <ul>
        <li><a href = "bcd">P11</a></li>
        <li><a href = "bcd">P12</a></li>
        <li><a href = "bcd">P13</a></li>
        <li><a href = "bcd">P14</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P2</a> 
    <ul>
        <li><a href = "bcd">P21</a></li>
        <li><a href = "bcd">P22</a></li>
        <li><a href = "bcd">P23</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P3</a> 
    <ul>
        <li><a href = "bcd">P31</a></li>
        <li><a href = "bcd">P32</a></li>
        <li><a href = "bcd">P33</a></li>
        <li><a href = "bcd">P34</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P4</a> 
    <ul>
        <li><a href = "bcd">P41</a></li>
        <li><a href = "bcd">P42</a></li>
    </ul>
</li>

My aim is to fill the following list from the above string.

List<class1>

class1 has two properties,

string parent;
List<string> children;

It should fill P1 in parent and P11,P12,P13,P14 in children, and make a list of them.

Any suggestion will be helpful.

Edit

Sample

public List<class1> getElements()
{
    List<class1> temp = new List<class1>();
    foreach(// <a> element in string)
    {
        //in the recursive loop
        List<string> str = new List<string>();
        str.add("P11");
        str.add("P12");
        str.add("P13");
        str.add("P14");

        class1 obj = new class1("P1",str);
        temp.add(obj);
    }
    return temp;
}

the values are hard coded here, but it would be dynamic.

like image 801
RTRokzzz Avatar asked Nov 30 '12 13:11

RTRokzzz


3 Answers

What you want is a recursive descent parser. All the other suggestions of using libraries are basically suggesting that you use a recursive descent parser for HTML or XML that has been written by others.

The basic structure of a recursive descent parser is to do a linear search of a list of tokens (in your case a string) and upon encountering a token that delimits a sub entity call the parser again to process the sublist of tokens (substring).

You can Google for the term "recursive descent parser" and find plenty of useful result. Even the Wikipedia article is fairly good in this case and includes an example of a recursive descent parser in C.

like image 171
slebetman Avatar answered Nov 13 '22 13:11

slebetman


If you can't use a third party tool like my recommended Html Agility Pack you could use the Webbrowser class and the HtmlDocument class to parse the HTML:

WebBrowser wbc = new WebBrowser();
wbc.DocumentText = "foo"; // necessary to create the document
HtmlDocument doc = wbc.Document.OpenNew(true);
doc.Write((string)html); // insert your html-string here
List<class1> elements = wbc.Document.GetElementsByTagName("li").Cast<HtmlElement>()
    .Where(li => li.Children.Count == 2)
    .Select(outerLi => new class1
    {
        parent = outerLi.FirstChild.InnerText,
        children = outerLi.Children.Cast<HtmlElement>()
            .Last().Children.Cast<HtmlElement>()
            .Select(innerLi => innerLi.FirstChild.InnerText).ToList()
    }).ToList();

Here's the result in the debugger window:

enter image description here

like image 3
Tim Schmelter Avatar answered Nov 13 '22 13:11

Tim Schmelter


You can also use XmlDocument:

XmlDocument doc = new XmlDocument();
doc.LoadXml(yourInputString);
XmlNodeList colNodes = xmlSource.SelectNodes("li");
foreach (XmlNode node in colNodes)
{
    // ... your logic here
    // for example
    // string parentName = node.SelectSingleNode("a").InnerText;
    // string parentHref = node.SelectSingleNode("a").Attribures["href"].Value;
    // XmlNodeList children = 
    //       node.SelectSingleNode("ul").SelectNodes("li");
    // foreach (XmlNode child in children)
    // {
    //         ......
    // }
}
like image 1
Arie Avatar answered Nov 13 '22 13:11

Arie