Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to manipulate DOM with Ruby on Rails

As the title said, I have some DOM manipulation tasks. For example, I want to: - find all H1 element which have blue color. - find all text which have size 12px. - etc..

How can I do it with Rails?

Thank you.. :)

Update

I have been doing some research about extracting web page content based on this paper-> http://www.springerlink.com/index/A65708XMUR9KN9EA.pdf

The summary of the step is:

  1. get the web url which I want to be extracted (single web page)
  2. grab some elements from the web page based on some visual rules (Ex: grab all H1 which have blue color)
  3. process the elements with my algorithm
  4. save the result into my database.

-sorry for my bad english-

like image 839
andrisetiawan Avatar asked Oct 23 '09 03:10

andrisetiawan


2 Answers

If what you're trying to do is manipulate HTML documents inside a rails application, you should take a look at Nokogiri.

It uses XPath to search through the document. With the following, you would find any h1 with the "blue" css class inside a document.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.stackoverflow.com'))
doc.xpath('//h1/a[@class="blue"]').each do |link|
    puts link.content
end

After, if what you were trying to do was indeed parse the current page dom, you should take a look at JavaScript and JQuery. Rails can't do that.

like image 163
Damien MATHIEU Avatar answered Sep 27 '22 01:09

Damien MATHIEU


http://railscasts.com/episodes/190-screen-scraping-with-nokogiri

like image 24
Ram Kumar Hariharan Avatar answered Sep 23 '22 01:09

Ram Kumar Hariharan