I have some table
<table>
<tr class="odd">
<td class="ind gray">1</td>
<td><b>acceding</b></td>
<td class="transcr">[əksˈiːdɪŋ]</td>
<td class="tran">присоединения</td>
</tr>
<!-- .... -->
<tr class="odd">
<td class="ind gray">999</td>
<td><b>related</b></td>
<td class="transcr">[rɪlˈeɪːtɪd]</td>
<td class="tran">родственный</td>
</tr>
</table>
I want parse three "td" in one row. My code
Dictionary<string, Word> words = new Dictionary<string, Word>();
string text = webBrowser1.DocumentText;
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(text);
for (int i = 0; i < doc.DocumentNode.SelectNodes("//tr").Count; i++)
{
HtmlNode node = doc.DocumentNode.SelectNodes("//tr")[i];
Word word = null;
if (TryParseWord(node, out word))
{
try
{
if (!words.ContainsKey(word.eng))
{
words.Add(word.eng, word);
}
}
catch
{ continue; }
}
}
And function for parsing
private bool TryParseWord(HtmlNode node, out Word word)
{
word = null;
try
{
var eng = node.SelectNodes("//td")[1].InnerText;
var trans = node.SelectNodes("//td")[2].InnerText;
var rus = node.SelectNodes("//td")[3].InnerText;
word = new Word();
word.eng = eng;
word.rus = rus;
word.trans = trans;
return true;
}
catch
{
word = null;
return false;
}
}
In my method TryParseWord I have value only from first row. How to fix this problem ?
I can get the values this way easily
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var table = doc.DocumentNode
.Descendants("tr")
.Select(n => n.Elements("td").Select(e => e.InnerText).ToArray());
And usage:
foreach (var tr in table)
{
Console.WriteLine("{0} {1} {2} {3}", tr[0], tr[1], tr[2], tr[3]);
}
You have to change the XPath so that it doesn't match from the start again. Like this:
node.SelectNodes(".//td")[1]
The dot tells the XPath to only match from the current node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With