Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Screen scraping web page containing button with AJAX

I am trying to automate some of our processes, one includes logging in to an external web page, clicking a link to expand details, then grab all details displayed.

I have got the process logging in, and can grab all of the details once they are expanded.

The problem is with clicking the link. The link is defined like below (I have removed what the Submit method actually does as the code is long and probably irrelevant. Obviously the img is placeholder just as an example):

<a id="form:SummarySubView:closedToggleControl" onclick="A4J.AJAX.Submit(...); return false;" href="#">
    <img ... />
</a>

I use this data as below:

void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    WebBrowser browser = (WebBrowser)sender;

    HtmlElement expandDetails = browser.Document.GetElementById("form:SummarySubView:closedToggleControl");
    //When open ID for element is "form:SummarySubView:openToggleControl"

    if(expandDetails == null) //If already expanded
    {
        //Stuff
    }
    else
    {
        expandDetails.InvokeMember("click"); //Click on element to run AJAX
    }
}

Upon running expandDetails.InvokeMember("click"); browser_DocumentCompleted gets called again as expected but the document is same as before and expandDetails is found again with the "closed" id. This means that the details I am looking for are never shown.

How do I get access to the document AFTER the AJAX script runs correctly?

Adding a Timer to delay checking the document doesn't seem to have worked.

like image 590
anothershrubery Avatar asked Feb 01 '17 15:02

anothershrubery


2 Answers

So a really simple solution seems to have worked. My code now looks like:

void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    WebBrowser browser = (WebBrowser)sender;

    HtmlElement expandDetails = browser.Document.GetElementById("form:SummarySubView:closedToggleControl");
    //When open ID for element is "form:SummarySubView:openToggleControl"

    if(expandDetails == null) //If already expanded
    {
        //Stuff
    }
    else
    {
        expandDetails.InvokeMember("click"); //Click on element to run AJAX

        while (expandDetails != null)
        {
            expandDetails = browser.Document.GetElementById("form:SummarySubView0:closedToggleControl");

            Application.DoEvents();
            System.Threading.Thread.Sleep(200);
        }

        //Stuff
    }
}

So running the while loop works fine for my case.

like image 141
anothershrubery Avatar answered Nov 14 '22 19:11

anothershrubery


Okay, first off, the document.complete event will fire for ALL frames in the page. So if you have 5 Iframes you will get 6 document complete events.

So you will need to check to see if you are actually the top level window or not. Doing that alone may fix your problem.

private void WebBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        WebBrowser wb = sender as WebBrowser;
        //check to make sure we are on the TOP-level page.
        if (wb.Document.Window.Parent == null)
        {
            //do whatever else you need to here
        }
    }

If that doesn't do it, you can just use a timer to wait a few seconds after the document completes.

 public partial class Form1 : Form
 {
    Timer t;
    public Form1()
    {
        InitializeComponent();
        webBrowser1.DocumentCompleted += WebBrowser1_DocumentCompleted;
    }
    private void WebBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        WebBrowser wb = sender as WebBrowser;

        //check to make sure we are on the TOP-level page.
        if (wb.Document.Window.Parent == null)
        {
            t = new Timer();
            t.Tick += (Timersender, eventargs) =>
            {
                //do whatever else you need to here
                t.Stop();
            };
            t.Interval = 2000; //wait 2 seconds for the document to complete
            t.Start();
        }
    }
}

You could tweak the timer to be longer or shorter as needed. But that should get you what you need.

like image 23
Alexander Ryan Baggett Avatar answered Nov 14 '22 20:11

Alexander Ryan Baggett