Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML XPath Searching by class name

I Have a problem with xpath in c#
I want to find all elements with this structure
I have 10 links which all of them have this structure:

<div class="PartialSearchResults-item" data-zen="true">
<div class="PartialSearchResults-item-title">
<a class="PartialSearchResults-item-title-link result-link"target="_blank" href='https://www.google.com/'> Google</a>
    </div>
<p class="PartialSearchResults-item-url">www.google.com</p>
<p class="PartialSearchResults-item-abstract">Search the world.</p>
   </div>

for example with this sample i want to get "Google" and "www.google.com" and "Search the world."

var titles = hd.DocumentNode.SelectNodes("//div[contains(@class, 'PartialSearchResults-item')]");
string link;
foreach (HtmlNode node in titles){
string description = node.SelectSingleNode(".//*[contains(@class,'PartialSearchResults-item-abstract')]").InnerText;

link = node.SelectSingleNode(".//*[contains(@class,'PartialSearchResults-item-url')]").InnerText;

 string title = node.SelectSingleNode(".//a[contains(@class,'PartialSearchResults-item-title-link result-link')]").InnerText;}

But I get error null reference

like image 429
mary Avatar asked Feb 04 '23 12:02

mary


1 Answers

The problem is in the query where you are getting the titles. You are looking for div which's class attribute contains PartialSearchResults-item, which is your item's root node. But there is also other nodes which are satisfying to your query, for example the div with class PartialSearchResults-item-title also satisfying to your query. Then after selecting this 2 divs you are iterating over them and trying to get sum child nodes, for the first iteration your code will work fine, because you have right node, but in the second iteration you have the node with class PartialSearchResults-item-title which only have one a, so you will get NullReferenceException in the second iteration when you are querying for the description, because you are trying to get value of the InnerText property of null object

string description = node.SelectSingleNode(".//*[contains(@class,'PartialSearchResults-item-abstract')]").InnerText;

I would suggest to not use contains. In your case your root node has only one class PartialSearchResults-item, so you can query it like this

var titles = hd.DocumentNode.SelectNodes("//div[@class='PartialSearchResults-item']");
like image 166
Ruben Vardanyan Avatar answered Feb 07 '23 13:02

Ruben Vardanyan