First of all apologies for my lack of technical knowledge and probable miscommunication, I'm quite a newbie to C#.
I've taken over a project which scrapes a number of webpages and saves them as .png files.
private void CaptureWebPage(string URL, string filePath, ImageFormat format)
{
System.Windows.Forms.WebBrowser web = new System.Windows.Forms.WebBrowser();
web.ScrollBarsEnabled = false;
web.ScriptErrorsSuppressed = true;
web.Navigate(URL);
while (web.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
System.Windows.Forms.Application.DoEvents();
System.Threading.Thread.Sleep(5000);
int width = web.Document.Body.ScrollRectangle.Width;
width += width / 10;
width = width <= 300 ? 600 : width;
int height = web.Document.Body.ScrollRectangle.Height;
height += height / 10;
web.Width = width;
web.Height = height;
_bmp = new System.Drawing.Bitmap(width, height);
web.DrawToBitmap(_bmp, new System.Drawing.Rectangle(0, 0, width, height));
_bmp.Save(filePath, format);
_bmp.Dispose();
}
However, some of the pages (only a small few) cause the process to hang. Its not all the time, but fairly often. I've discovered the problem seems to be in the following part of code:
while (web.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
System.Windows.Forms.Application.DoEvents();
It looks as though the web.ReadyState gets stuck at 'interactive' and never progresses to 'complete' so it just keeps looping.
Is it possible to put in code that causes the process to restart for that page if the web.ReadyState = 'Interactive' for a certain amount of time, and if so what would the syntax be?
Ive replaced the existing problematic code with the following (found on thebotnet.com):
while (web.IsBusy)
System.Windows.Forms.Application.DoEvents();
for (int i = 0; i < 500; i++)
if (web.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
{
System.Windows.Forms.Application.DoEvents();
System.Threading.Thread.Sleep(10);
}
else
break;
System.Windows.Forms.Application.DoEvents();
I've tested it a few times and all pages seem to be scraped fine. I'll continue testing it just in case, but if you have any information on issues it could cause please let me know, as I may not find them myself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With