I have a program that scrapes value from https://web.apps.markit.com/WMXAXLP?YYY2220_zJkhPN/sWPxwhzYw8K4DcqW07HfIQykbYMaXf8fTzWT6WKnuivTcM0W584u1QRwj
My current code is:
doc = Nokogiri::HTML(open(source_url))
puts doc.css('span.indexDate').text
date = doc.css('span.indexDate').text
date = Date.parse(date)
puts date
values = doc.css('table#CdsIndexTable td.col2 span')
puts values
This scrapes the date and values of the second column from the "CDS Indexes" table correctly which is fine. Now, I want to scrape the similar values from the "Bond Indexes" table where I am facing the problem.
I can see a JavaScript function changes it without loading the page and without changing the URL of the page. The difference between these two tables is their IDs are different which is exactly that it should be. But, unfortunately when I try with:
values = doc.css('table#BondIndexTable')
puts values
I get nothing from the Bond Indexes table. But I get values from CDS Indexes table if I use:
values = doc.css('table#CdsIndexTable')
puts values
How can I get the values from both tables?
You can use Capybara with the Poltergeist driver to execute the Javascript and format the page. Poltergeist is a wrapper for the PhantomJS headless browser. Here's an example of how you can do it:
require 'rubygems'
require 'capybara'
require 'capybara/dsl'
require 'capybara/poltergeist'
Capybara.default_driver = :poltergeist
Capybara.run_server = false
module GetPrice
class WebScraper
include Capybara::DSL
def get_page_data(url)
visit(url)
doc = Nokogiri::HTML(page.html)
doc.css('td.col2 span')
end
end
end
scraper = GetPrice::WebScraper.new
puts scraper.get_page_data('https://web.apps.markit.com/WMXAXLP?YYY2220_zJkhPN/sWPxwhzYw8K4DcqW07HfIQykbYMaXf8fTzWT6WKnuivTcM0W584u1QRwj').map(&:text).inspect
Visit here for a complete example using Amazon.com: https://github.com/wakproductions/amazon_get_price/blob/master/getprice.rb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With