Here i am trying to read urls and getting the images in a page. I need to exclude the page if it is 404 and stop getting the images from a 404 error page. How to do it using HtmlAgilityPack? Here is my code
var document = new HtmlWeb().Load(completeurl);
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s)).ToList();
You'll need to register a PostRequestHandler
event on the HtmlWeb
instance, it will be raised after each downloaded document and you'll get access to the HttpWebResponse
object. It has a property for the StatusCode
.
HtmlWeb web = new HtmlWeb();
HttpStatusCode statusCode = HttpStatusCode.OK;
web.PostRequestHandler += (request, response) =>
{
if (response != null)
{
statusCode = response.StatusCode;
}
}
var doc = web.Load(completeUrl)
if (statusCode == HttpStatusCode.OK)
{
// received a read document
}
Looking at the code of the HtmlAgilityPack on GitHub, it's even simpler, HtmlWeb
has a property StatusCode
which is set with the value:
var web = new HtmlWeb();
var document = web.Load(completeurl);
if (web.StatusCode == HttpStatusCode.OK)
{
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s)).ToList();
}
There has been an update to the AgilityPack API. The trick is still the same:
var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;
htmlWeb.PostResponse = (request, response) =>
{
if (response != null)
{
lastStatusCode = response.StatusCode;
}
};
Be aware of the version you use!
I am using HtmlAgilityPack v1.5.1
and there is no PostRequestHandler
event.
In the v1.5.1
one has to use PostResponse
field. See example below.
var htmlWeb = new HtmlWeb();
var lastStatusCode = HttpStatusCode.OK;
htmlWeb.PostResponse = (request, response) =>
{
if (response != null)
{
lastStatusCode = response.StatusCode;
}
};
There are not many differences but still they are.
Hope this will save some time to someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With