Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML Agility pack: parsing an href tag

Tags:

How would I effectively parse the href attribute value from this :

<tr> <td rowspan="1" colspan="1">7</td> <td rowspan="1" colspan="1"> <a class="undMe" href="/ice/player.htm?id=8475179" rel="skaterLinkData" shape="rect">D. Kulikov</a> </td> <td rowspan="1" colspan="1">D</td> <td rowspan="1" colspan="1">0</td> <td rowspan="1" colspan="1">0</td> <td rowspan="1" colspan="1">0</td> [...] 

I am interested in having the player id, which is: 8475179 Here is the code I have so far:

        // Iterate all rows (players)         for (int i = 1; i < rows.Count; ++i)         {             HtmlNodeCollection cols = rows[i].SelectNodes(".//td");              // new player             Dim_Player player = new Dim_Player();                  // Iterate all columns in this row                 for (int j = 1; j < 6; ++j)                 {                     switch (j) {                         case 1: player.Name = cols[j].InnerText;                                 player.Player_id = Int32.Parse(/* this is where I want to parse the href value */);                                  break;                         case 2: player.Position = cols[j].InnerText; break;                         case 3: stats.Goals = Int32.Parse(cols[j].InnerText); break;                         case 4: stats.Assists = Int32.Parse(cols[j].InnerText); break;                         case 5: stats.Points = Int32.Parse(cols[j].InnerText); break;                     }                 } 
like image 378
JF Beaulieu Avatar asked Dec 13 '11 23:12

JF Beaulieu


2 Answers

Based on your example this worked for me:

HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.Load("test.html"); var link = htmlDoc.DocumentNode                   .Descendants("a")                   .First(x => x.Attributes["class"] != null                             && x.Attributes["class"].Value == "undMe");  string hrefValue = link.Attributes["href"].Value; long playerId = Convert.ToInt64(hrefValue.Split('=')[1]); 

For real use you need to add error checking etc.

like image 65
BrokenGlass Avatar answered Sep 24 '22 15:09

BrokenGlass


Use an XPath expression to find it:

 foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@class='undMe']"))  {       HtmlAttribute att = link.Attributes["href"];       Console.WriteLine(new Regex(@"(?<=[\?&]id=)\d+(?=\&|\#|$)").Match(att.Value).Value);  } 
like image 40
csharptest.net Avatar answered Sep 25 '22 15:09

csharptest.net