I have spent a good time now on configuring my proxy. At the moment I use a service called proxybonanza. They supply me with a proxy which I use to fetch webpages.
I'm using HTMLAGILITYPACK
Now if I run my code without a proxy there's no problem locally or when uploaded to webhost server.
If I decide to use the proxy, it takes somewhat longer but it stills works locally.
If I publish my solution to, to my webhost I get a SocketException (0x274c)
"A connection attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because connected host has
failed to respond 38.69.197.71:45623"
I have been debugging this for a long time.
My app.config has two entries that are relevant for this
httpWebRequest useUnsafeHeaderParsing="true"
httpRuntime executionTimeout="180"
That helped me through a couple of problems.
Now this is my C# code.
HtmlWeb htmlweb = new HtmlWeb();
htmlweb.PreRequest = new HtmlAgilityPack.HtmlWeb.PreRequestHandler(OnPreRequest);
HtmlDocument htmldoc = htmlweb.Load(@"http://www.websitetofetch.com,
"IP", port, "username", "password");
//This is the preRequest config
static bool OnPreRequest(HttpWebRequest request)
{
request.KeepAlive = false;
request.Timeout = 100000;
request.ReadWriteTimeout = 1000000;
request.ProtocolVersion = HttpVersion.Version10;
return true; // ok, go on
}
What am I doing wrong? I have enabled the tracer in the appconfig, but I don't get a log on my webhost...?
Log stuff from app.config
<system.diagnostics>
<sources>
<source name="System.ServiceModel.MessageLogging" switchValue="Warning, ActivityTracing" >
<listeners>
<add name="ServiceModelTraceListener"/>
</listeners>
</source>
<source name="System.ServiceModel" switchValue="Verbose,ActivityTracing">
<listeners>
<add name="ServiceModelTraceListener"/>
</listeners>
</source>
<source name="System.Runtime.Serialization" switchValue="Verbose,ActivityTracing">
<listeners>
<add name="ServiceModelTraceListener"/>
</listeners>
</source>
</sources>
<sharedListeners>
<add initializeData="App_tracelog.svclog"
type="System.Diagnostics.XmlWriterTraceListener, System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
name="ServiceModelTraceListener" traceOutputOptions="Timestamp"/>
</sharedListeners>
</system.diagnostics>
Can anyone spot the problem I have these setting on and off like a thousand times..
request.KeepAlive = false;
System.Net.ServicePointManager.Expect100Continue = false;
Carl
Try downloading the page as a string first, then passing it to HtmlAgilityPack. This will let you isolate errors that happen during the download process from those that happen during the html parsing process. If you have an issue with proxybonanza (see end of post) you will be able to isolate that issue from a HtmlAgilityPack configuration issue.
Download page using WebClient:
// Download page
System.Net.WebClient client = new System.Net.WebClient();
client.Proxy = new System.Net.WebProxy("{proxy address and port}");
string html = client.DownloadString("http://example.com");
// Process result
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
If you want more control over the request, use System.Net.HttpWebRequest:
// Create request
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://example.com/");
// Apply settings (including proxy)
request.Proxy = new WebProxy("{proxy address and port}");
request.KeepAlive = false;
request.Timeout = 100000;
request.ReadWriteTimeout = 1000000;
request.ProtocolVersion = HttpVersion.Version10;
// Get response
try
{
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string html = reader.ReadToEnd();
}
catch (WebException)
{
// Handle web exceptions
}
catch (Exception)
{
// Handle other exceptions
}
// Process result
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
Also, ensure that your proxy provider (proxybonanza) allows access from your production environment to your proxies. Most providers will limit access to the proxies to certain IP addresses. They may have allowed access to the external IP of the network where you are running locally but NOT the external IP address of your production environment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With