I'm using Ruby, Selenium WebDriver and Nokogiri to retrieve data from webpages. Once the proper HTML is loaded, I print the contents of a certain class.
For example,
require "selenium-webdriver"
require "nokogiri"
browser = Selenium::WebDriver.for :chrome
browser.get "https://jsfiddle.net"
doc = Nokogiri::HTML.parse(browser.page_source)
doc.css('.aiButton').map(&:text).join(',')
I've found by far the hardest part is getting the correct HTML loaded properly. For example, the content I want might be hidden by some javascript, or might be on different page.
Is it possible to use Selenium to load the page, then manually manipulate the page so the correct HTML is displayed, and then allow the bot to finish and print the content it's supposed to?
You can use Selenium to interact with the webpage - fill form fields, click buttons etc. You can even execute your own javascript code.
Selenium cheat sheet
Edit:
Using pry to stop the code execution so you can manually manipulate the web page.
# Code for starting Selenium session and opening the web page
...
# Use pry to stop the code execution.
# Resume the program using command 'exit' in the pry context
require 'pry'; binding.pry
# Code to get results after you manually manipulate the web page
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With