Im trying to scrape a page. Everything is ok, but when values are updated, the sourse code of page is still the same for a minute. Even when i refresh a page with slow internet connection, first i see old data, and only after page gets fully loaded values are current. I guess javascript updates them. BUt it still has to download them somehow.
How can i get current values?
I write my program in C#, but if you have some ideas/advices/examples language doesnt really matter.
Thank you.
You're right - javascript is probably updating the data after load.
I could think of three ways to handle this:
Use a webbrowser control - I guess your using the HttpWebRequest object to retrieve values from the site. This won't work if you need to let the javascript to run. You can use the webbrowser control, let the javascript run and retrieve values from the DOM. Only thing I don't like about this approach is it feels like a hack and probably too clunky for prod applications. You also need to know when to read the contents of the DOM (an update might be inprogress in the background). Google "C# WebBrowser Control Read DOM Programmatically" or you can read more about that here.
I personally prefer this over the previous but it doesn't work all the time. First you need to inspect the website from firebug or something and see which urls are called from the background. Say for example the site is updating stock quotes using javascript. Most likely, its using an asynchronous request to retrieving the updated information from a webservice. Using firebug, you can view this under NET>XHR. Now is the hard part. Well, take a look at the request and the values returned. The idea is, you can try to retrieve the values your self and parse the contents - which can be a lot easier than scraping a page. The problem is, you would need to do a bit of reverse engineering to get it right. You might also encounter problems with authentication and/or encryption.
Lastly and my most preferred solution is asking the owner [of the site your are scraping] directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With