I am using c#. I have following string
<li>
<a href="abc">P1</a>
<ul>
<li><a href = "bcd">P11</a></li>
<li><a href = "bcd">P12</a></li>
<li><a href = "bcd">P13</a></li>
<li><a href = "bcd">P14</a></li>
</ul>
</li>
<li>
<a href="abc">P2</a>
<ul>
<li><a href = "bcd">P21</a></li>
<li><a href = "bcd">P22</a></li>
<li><a href = "bcd">P23</a></li>
</ul>
</li>
<li>
<a href="abc">P3</a>
<ul>
<li><a href = "bcd">P31</a></li>
<li><a href = "bcd">P32</a></li>
<li><a href = "bcd">P33</a></li>
<li><a href = "bcd">P34</a></li>
</ul>
</li>
<li>
<a href="abc">P4</a>
<ul>
<li><a href = "bcd">P41</a></li>
<li><a href = "bcd">P42</a></li>
</ul>
</li>
My aim is to fill the following list from the above string.
List<class1>
class1 has two properties,
string parent;
List<string> children;
It should fill P1 in parent and P11,P12,P13,P14 in children, and make a list of them.
Any suggestion will be helpful.
Edit
Sample
public List<class1> getElements()
{
List<class1> temp = new List<class1>();
foreach(// <a> element in string)
{
//in the recursive loop
List<string> str = new List<string>();
str.add("P11");
str.add("P12");
str.add("P13");
str.add("P14");
class1 obj = new class1("P1",str);
temp.add(obj);
}
return temp;
}
the values are hard coded here, but it would be dynamic.
What you want is a recursive descent parser. All the other suggestions of using libraries are basically suggesting that you use a recursive descent parser for HTML or XML that has been written by others.
The basic structure of a recursive descent parser is to do a linear search of a list of tokens (in your case a string) and upon encountering a token that delimits a sub entity call the parser again to process the sublist of tokens (substring).
You can Google for the term "recursive descent parser" and find plenty of useful result. Even the Wikipedia article is fairly good in this case and includes an example of a recursive descent parser in C.
If you can't use a third party tool like my recommended Html Agility Pack
you could use the Webbrowser
class and the HtmlDocument
class to parse the HTML:
WebBrowser wbc = new WebBrowser();
wbc.DocumentText = "foo"; // necessary to create the document
HtmlDocument doc = wbc.Document.OpenNew(true);
doc.Write((string)html); // insert your html-string here
List<class1> elements = wbc.Document.GetElementsByTagName("li").Cast<HtmlElement>()
.Where(li => li.Children.Count == 2)
.Select(outerLi => new class1
{
parent = outerLi.FirstChild.InnerText,
children = outerLi.Children.Cast<HtmlElement>()
.Last().Children.Cast<HtmlElement>()
.Select(innerLi => innerLi.FirstChild.InnerText).ToList()
}).ToList();
Here's the result in the debugger window:
You can also use XmlDocument:
XmlDocument doc = new XmlDocument();
doc.LoadXml(yourInputString);
XmlNodeList colNodes = xmlSource.SelectNodes("li");
foreach (XmlNode node in colNodes)
{
// ... your logic here
// for example
// string parentName = node.SelectSingleNode("a").InnerText;
// string parentHref = node.SelectSingleNode("a").Attribures["href"].Value;
// XmlNodeList children =
// node.SelectSingleNode("ul").SelectNodes("li");
// foreach (XmlNode child in children)
// {
// ......
// }
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With