How would I effectively parse the href attribute value from this :
<tr> <td rowspan="1" colspan="1">7</td> <td rowspan="1" colspan="1"> <a class="undMe" href="/ice/player.htm?id=8475179" rel="skaterLinkData" shape="rect">D. Kulikov</a> </td> <td rowspan="1" colspan="1">D</td> <td rowspan="1" colspan="1">0</td> <td rowspan="1" colspan="1">0</td> <td rowspan="1" colspan="1">0</td> [...]
I am interested in having the player id, which is: 8475179 Here is the code I have so far:
// Iterate all rows (players) for (int i = 1; i < rows.Count; ++i) { HtmlNodeCollection cols = rows[i].SelectNodes(".//td"); // new player Dim_Player player = new Dim_Player(); // Iterate all columns in this row for (int j = 1; j < 6; ++j) { switch (j) { case 1: player.Name = cols[j].InnerText; player.Player_id = Int32.Parse(/* this is where I want to parse the href value */); break; case 2: player.Position = cols[j].InnerText; break; case 3: stats.Goals = Int32.Parse(cols[j].InnerText); break; case 4: stats.Assists = Int32.Parse(cols[j].InnerText); break; case 5: stats.Points = Int32.Parse(cols[j].InnerText); break; } }
Based on your example this worked for me:
HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.Load("test.html"); var link = htmlDoc.DocumentNode .Descendants("a") .First(x => x.Attributes["class"] != null && x.Attributes["class"].Value == "undMe"); string hrefValue = link.Attributes["href"].Value; long playerId = Convert.ToInt64(hrefValue.Split('=')[1]);
For real use you need to add error checking etc.
Use an XPath expression to find it:
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@class='undMe']")) { HtmlAttribute att = link.Attributes["href"]; Console.WriteLine(new Regex(@"(?<=[\?&]id=)\d+(?=\&|\#|$)").Match(att.Value).Value); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With