Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping with Nokogiri and Ruby before and after JavaScript changes the value

I have a program that scrapes value from https://web.apps.markit.com/WMXAXLP?YYY2220_zJkhPN/sWPxwhzYw8K4DcqW07HfIQykbYMaXf8fTzWT6WKnuivTcM0W584u1QRwj

My current code is:

doc = Nokogiri::HTML(open(source_url))

puts doc.css('span.indexDate').text
date = doc.css('span.indexDate').text
date = Date.parse(date)
puts date
values = doc.css('table#CdsIndexTable td.col2 span')
puts values

This scrapes the date and values of the second column from the "CDS Indexes" table correctly which is fine. Now, I want to scrape the similar values from the "Bond Indexes" table where I am facing the problem.

I can see a JavaScript function changes it without loading the page and without changing the URL of the page. The difference between these two tables is their IDs are different which is exactly that it should be. But, unfortunately when I try with:

values = doc.css('table#BondIndexTable')
puts values

I get nothing from the Bond Indexes table. But I get values from CDS Indexes table if I use:

values = doc.css('table#CdsIndexTable')
puts values

How can I get the values from both tables?

like image 780
K M Rakibul Islam Avatar asked Nov 13 '12 16:11

K M Rakibul Islam


1 Answers

You can use Capybara with the Poltergeist driver to execute the Javascript and format the page. Poltergeist is a wrapper for the PhantomJS headless browser. Here's an example of how you can do it:

require 'rubygems'
require 'capybara'
require 'capybara/dsl'
require 'capybara/poltergeist'

Capybara.default_driver = :poltergeist
Capybara.run_server = false

module GetPrice
  class WebScraper
    include Capybara::DSL

    def get_page_data(url)
      visit(url)
      doc = Nokogiri::HTML(page.html)
      doc.css('td.col2 span')
    end
  end
end

scraper = GetPrice::WebScraper.new
puts scraper.get_page_data('https://web.apps.markit.com/WMXAXLP?YYY2220_zJkhPN/sWPxwhzYw8K4DcqW07HfIQykbYMaXf8fTzWT6WKnuivTcM0W584u1QRwj').map(&:text).inspect

Visit here for a complete example using Amazon.com: https://github.com/wakproductions/amazon_get_price/blob/master/getprice.rb

like image 54
Winston Kotzan Avatar answered Oct 12 '22 23:10

Winston Kotzan