As the title said, I have some DOM manipulation tasks. For example, I want to: - find all H1 element which have blue color. - find all text which have size 12px. - etc..
How can I do it with Rails?
Thank you.. :)
Update
I have been doing some research about extracting web page content based on this paper-> http://www.springerlink.com/index/A65708XMUR9KN9EA.pdf
The summary of the step is:
-sorry for my bad english-
If what you're trying to do is manipulate HTML documents inside a rails application, you should take a look at Nokogiri.
It uses XPath to search through the document. With the following, you would find any h1 with the "blue" css class inside a document.
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.stackoverflow.com'))
doc.xpath('//h1/a[@class="blue"]').each do |link|
puts link.content
end
After, if what you were trying to do was indeed parse the current page dom, you should take a look at JavaScript and JQuery. Rails can't do that.
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With