Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web.ReadyState does not reach 'complete' stage

Tags:

c#

winforms

First of all apologies for my lack of technical knowledge and probable miscommunication, I'm quite a newbie to C#.

I've taken over a project which scrapes a number of webpages and saves them as .png files.

private void CaptureWebPage(string URL, string filePath, ImageFormat format)
{
    System.Windows.Forms.WebBrowser web = new System.Windows.Forms.WebBrowser();
    web.ScrollBarsEnabled = false; 
    web.ScriptErrorsSuppressed = true; 
    web.Navigate(URL); 

    while (web.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
        System.Windows.Forms.Application.DoEvents();
    System.Threading.Thread.Sleep(5000);

    int width = web.Document.Body.ScrollRectangle.Width;
    width += width / 10;
    width = width <= 300 ? 600 : width; 


    int height = web.Document.Body.ScrollRectangle.Height;
    height += height / 10;

    web.Width = width;
    web.Height = height;

    _bmp = new System.Drawing.Bitmap(width, height);


    web.DrawToBitmap(_bmp, new System.Drawing.Rectangle(0, 0, width, height));
    _bmp.Save(filePath, format);

    _bmp.Dispose();

}

However, some of the pages (only a small few) cause the process to hang. Its not all the time, but fairly often. I've discovered the problem seems to be in the following part of code:

while (web.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
    System.Windows.Forms.Application.DoEvents();

It looks as though the web.ReadyState gets stuck at 'interactive' and never progresses to 'complete' so it just keeps looping.

Is it possible to put in code that causes the process to restart for that page if the web.ReadyState = 'Interactive' for a certain amount of time, and if so what would the syntax be?

like image 368
MarkyWil Avatar asked Jul 01 '13 15:07

MarkyWil


1 Answers

Ive replaced the existing problematic code with the following (found on thebotnet.com):

while (web.IsBusy)
    System.Windows.Forms.Application.DoEvents();
for (int i = 0; i < 500; i++)
    if (web.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
   {
       System.Windows.Forms.Application.DoEvents();
       System.Threading.Thread.Sleep(10); 
   }
   else
       break;
System.Windows.Forms.Application.DoEvents();

I've tested it a few times and all pages seem to be scraped fine. I'll continue testing it just in case, but if you have any information on issues it could cause please let me know, as I may not find them myself.

like image 128
MarkyWil Avatar answered Nov 17 '22 12:11

MarkyWil