i have a script (below) that scrapes a site with a 3 step process. it works great when set to a maximum of 1 page at a time. however, when i increase that to 2 at a time things start getting wonky. the onFinished fires earlier than i would expect and the page isn't completely loaded yet. because of this the rest of my script breaks. any idea why this might be happening? i should add that i'm using the newest version (1.5).
MAX_PAGES = 1
###
changing MAX_PAGES to >1 causes some pages onFinished event to fire before
the page is fully rendered. this is evident by the fact that there are >1 images
for some pages. i havent been able to reproduce using microsoft.com, but on some
pages i was working on the first onLoadFinished seemed to be called before the page
was actually fully loaded based on the look of the rendered images
###
newPage = (id) ->
context = {}
context.id = id
context.step = 0
context.page = require('webpage').create()
context.page.onLoadStarted = ->
context.step++
context.page.onLoadFinished = (status) ->
console.log status
if status is 'success'
context.page.render("#{context.id}_#{context.step}.png")
else
context.page.release()
context.page.open('http://www.microsoft.com')
console.log 'started loading'
newPage id for id in [1..MAX_PAGES]
Because of its rendering features, PhantomJS can be used to capture web pages, essentially taking a screenshot of the contents. The following loadspeed.jsscript loads a specified URL (do not forget the httpprotocol) and measures the time it takes to load it.
It is very importantto call phantom.exitat some point in the script, otherwise PhantomJS will not be terminated at all. Page Loading A web page can be loaded, analyzed, and rendered by creating a webpageobject. The following script demonstrates the simplest use of page object.
The problem is many web-sites are loading their minor content async and that's why Phantom's onLoadFinished callback (analogue for onLoad in HTML) fired too early when not everything still has loaded. Can anyone suggest how can I wait for full load of a webpage to make, for example, a screenshot with all dynamic content like ads?
/** * See https://github. com/ariya/phantomjs/blob/master/examples/waitfor. js * * Wait until the test condition is true or a timeout occurs. Useful for waiting * on a server response or for a ui change (fadeIn, etc.) to occur. * * @param testFx javascript condition that evaluates to a boolean, * it can be passed in as a string.
I think the problem has to do with the fact that each webpage within PhantomJS is using the same QNetworkAccessManager, thus, the finished() signal is firing when each webpage object finishes loading. Modifications to PhantomJS's code might need to be made in order to fix this problem. I have noticed this before when trying to load multiple pages in parallel in PhantomJS. An application I'm working on uses QtWebkit and loads multiple pages simultaneously so I have to make sure that each webpage gets its own QNetworkAccessManager so that the finished() signals don't interfere with each other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With