This pertains to cookies set inside a script (maybe inside a script tag).
System.Windows.Forms.HtmlDocument
executes those scripts and the cookies set (like document.cookie=etc...
) can be retrieved through its Cookies property.
I assume HtmlAgilityPack.HtmlDocument
doesn't do this (execution). I wonder if there is an easy way to emulate the System.Windows.Forms.HtmlDocument
capabilities (the cookies part).
Anyone?
When I need to use Cookies and HtmlAgilityPack together, or just create custom requests (for example, set the User-Agent
property, etc), here is what I do:
WebQuery
...
public HtmlAgilityPack.HtmlDocument GetSource(string url);
What do we need to do inside this method?
Well, using HttpWebRequest and HttpWebResponse, generate the http request manually (there are several examples of how to do this on Internet), create an instance of a HtmlDocument
class using the constructor that receives an stream.
What stream do we have to use? Well, the one returned by:
httpResponse.GetResponseStream();
If you use HttpWebRequest to make the query, you can easily set the CookieContainer
property of it to the variable you declared before everytime you access a new page, and that way all cookies set by the sites you access will be properly stored in the CookieContainer
variable you declared in your WebQuery
class, taking in count you're using only one instance of the WebQuery
class.
Hope you find useful this explanation. Take in count that using this, you can do whatever you want, no matter if HtmlAgilityPack supports it or not.
I also worked with Rohit Agarwal's BrowserSession class together with HtmlAgilityPack. But for me subsequent calls of the "Get-function" didn't work, because every time new cookies have been set. That's why I added some functions by my own. (My solution is far a way from beeing perfect - it's just a quick and dirty fix) But for me it worked and if you don't want to spent a lot of time in investigating BrowserSession class here is what I did:
The added/modified functions are the following:
class BrowserSession{
private bool _isPost;
private HtmlDocument _htmlDoc;
public CookieContainer cookiePot; //<- This is the new CookieContainer
...
public string Get2(string url)
{
HtmlWeb web = new HtmlWeb();
web.UseCookies = true;
web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest2);
web.PostResponse = new HtmlWeb.PostResponseHandler(OnAfterResponse2);
HtmlDocument doc = web.Load(url);
return doc.DocumentNode.InnerHtml;
}
public bool OnPreRequest2(HttpWebRequest request)
{
request.CookieContainer = cookiePot;
return true;
}
protected void OnAfterResponse2(HttpWebRequest request, HttpWebResponse response)
{
//do nothing
}
private void SaveCookiesFrom(HttpWebResponse response)
{
if ((response.Cookies.Count > 0))
{
if (Cookies == null)
{
Cookies = new CookieCollection();
}
Cookies.Add(response.Cookies);
cookiePot.Add(Cookies); //-> add the Cookies to the cookiePot
}
}
What it does: It basically saves the cookies from the initial "Post-Response" and adds the same CookieContainer to the request called later. I do not fully understand why it was not working in the initial version because it somehow does the same in the AddCookiesTo-function. (if (Cookies != null && Cookies.Count > 0) request.CookieContainer.Add(Cookies);) Anyhow, with these added functions it should work fine now.
It can be used like this:
//initial "Login-procedure"
BrowserSession b = new BrowserSession();
b.Get("http://www.blablubb/login.php");
b.FormElements["username"] = "yourusername";
b.FormElements["password"] = "yourpass";
string response = b.Post("http://www.blablubb/login.php");
all subsequent calls should use:
response = b.Get2("http://www.blablubb/secondpageyouwannabrowseto");
response = b.Get2("http://www.blablubb/thirdpageyouwannabrowseto");
...
I hope it helps when you're facing the same problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With