Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML Agility Pack settings

Tags:

html

c#

I am using Agility Pack to parse HTML, following this question What is the best way to parse html in C#? and I am getting great results :) The problem comes when I entre in some webpages were the results are based on my location, so for example like I am in Spain, I am getting the results for Spain region and I would like to change like if I were in England, how can it be done? I mean it is something I have to change in the user agent? ( I use as a user agent “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:x.x.x) Gecko/20041107 Firefox/x.x)”

like image 819
jobormo Avatar asked Feb 26 '13 22:02

jobormo


2 Answers

You could use the WebClient.DownloadString method which allows you to set HTTP request headers to download the contents of the web page and then feed it to HTML agility Pack.

The UserAgent is not what controls the language. It is the Accept-Language header. So for example:

using (var client = new WebClient())
{
    client.Headers[HttpRequestHeader.AcceptLanguage] = "es-ES";
    client.Headers[HttpRequestHeader.UserAgent] = "some user agent if you wish";
    string html = client.DownloadString("http://example.com");
    // feed the HTML to HTML Agility Pack
    var doc = new HtmlDocument();
    doc.LoadHtml(html);

    // now do the parsing
}

But if the site uses IP based recognition to send you content in different languages there's not much you could do from the client side to change that.

like image 112
Darin Dimitrov Avatar answered Sep 20 '22 10:09

Darin Dimitrov


location based search or pages are generally done via ip, or when you register, you tell the site where you are. you may want to look into an anon proxy within the country you would like to look like you are in.

like image 27
bizzehdee Avatar answered Sep 21 '22 10:09

bizzehdee