I'm using Mechanize, although I'm open to Nokogiri if Mechanize can't do it.
I'd like to scrape the page after all the scripts have loaded as opposed to beforehand.
How might I do this?
I think a good option is something like this with Nokogiri, Watir, and PhantomJs:
b = Watir::Browser.new(:phantomjs)
b.goto URL
doc = Nokogiri::HTML(b.html)
The resulting doc will be from when after the scripts have been loaded. And phantomjs is nice because there is no need to load a browser.
Nokogiri and Mechanize are not full web browsers and do not run JavaScript in a browser-model DOM. You want to use something like Watir or Selenium which allow you to use Ruby to control an actual web browser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With