There's a bug in Traverse()
that's causing it to iterate nodes more than once.
public IEnumerable<HtmlNode> Traverse()
{
foreach (var node in _context)
{
yield return node;
foreach (var child in Children().Traverse())
yield return child;
}
}
public SharpQuery Children()
{
return new SharpQuery(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
}
public SharpQuery(IEnumerable<HtmlNode> nodes, SharpQuery previous = null)
{
if (nodes == null) throw new ArgumentNullException("nodes");
_previous = previous;
_context = new List<HtmlNode>(nodes);
}
static void Main(string[] args)
{
var sq = new SharpQuery(@"
<a>
<b>
<c/>
<d/>
<e/>
<f>
<g/>
<h/>
<i/>
</f>
</b>
</a>");
var nodes = sq.Traverse();
Console.WriteLine("{0} nodes: {1}", nodes.Count(), string.Join(",", nodes.Select(n => n.Name)));
Console.ReadLine();
19 nodes: #document,a,b,c,g,h,i,d,g,h,i,e,g,h,i,f,g,h,i
Each letter a-i printed once.
Can't seem to figure out where's it going wrong... node.ChildNodes
does return just direct children, right? (from HtmlAgilityPack)
Full class (condensed) if you want to try and run it yourself.
public class SQLite
{
private readonly List<HtmlNode> _context = new List<HtmlNode>();
private readonly SQLite _previous = null;
public SQLite(string html)
{
var doc = new HtmlDocument();
doc.LoadHtml(html);
_context.Add(doc.DocumentNode);
}
public SQLite(IEnumerable<HtmlNode> nodes, SQLite previous = null)
{
if (nodes == null) throw new ArgumentNullException("nodes");
_previous = previous;
_context = new List<HtmlNode>(nodes);
}
public IEnumerable<HtmlNode> Traverse()
{
foreach (var node in _context)
{
yield return node;
foreach (var child in Children().Traverse())
yield return child;
}
}
public SQLite Children()
{
return new SQLite(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
}
}
First, note that everything goes according to plan as long as we're iterating over elements that don't have any sibling. As soon as we hit element <c>
, things start going haywire. It's also interesting that <c>
, <d>
and <e>
all think they contain <f>
's children.
Let's take a closer look at your call to SelectMany()
:
_context.SelectMany(n => n.ChildNodes)
That iterates through the items in _context
and accumulates the child nodes of each item. Let's have a look at _context
. Everything should be okay, since it's a List
of length 1
. Or is it?
I suspect your SharpQuery(string)
constructor stores sibling elements inside the same list. In that case, _context
may not be of length 1
anymore and, remember, SelectMany()
accumulates the child nodes of each item in the list.
For example, if _context
is a list containing <c>
, <d>
, <e>
and <f>
, only <f>
has children, and SelectMany()
is called once for each element, it will accumulate and return the children of <f>
four times.
I think that's your bug.
EDIT: Since you've posted the full class, I don't have to guess anymore. Look at the sequence of operations when you iterate over <b>
(iterators replaced by lists for better comprehension):
Children()
on <b>
, which returns <c>
, <d>
, <e>
and <f>
,Traverse()
on the resulting list:
Children()
on <c>
, but _context
actually contains <c>
, <d>
, <e>
and <f>
, not only <c>
, so that returns <g>
, <h>
and <i>
,Children()
on <d>
, same thing since _context
is the same for both <c>
and <d>
(and <e>
, and <f>
),Children() is broken, it selects children of siblings as well. I'd rewrite to something like this:
public IEnumerable<HtmlNode> Traverse(IEnumerable<HtmlNode> nodes)
{
foreach (var node in nodes)
{
yield return node;
foreach (var child in Traverse(ChildNodes(node)))
yield return child;
}
}
private IEnumerable<HtmlNode> ChildNodes(HtmlNode node)
{
return node.ChildNodes.Where(n => n.NodeType == HtmlNodeType.Element);
}
public SharpQuery(IEnumerable<HtmlNode> nodes, SharpQuery previous = null)
{
if (nodes == null) throw new ArgumentNullException("nodes");
_previous = previous; // Is this necessary?
_context = new List<HtmlNode>(nodes);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With