Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nokogiri: Running into error "undefined method ‘text’ for nil:NilClass"

Tags:

ruby

nokogiri

I'm a newbie to programmer so excuse my noviceness. So I'm using Nokogiri to scrape a police crime log. Here is the code below:

require 'rubygems'
require 'nokogiri'
require 'open-uri'

url = "http://www.sfsu.edu/~upd/crimelog/index.html"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".brief").each do |brief|
 puts brief.at_css("h3").text
end

I used the selector gadget bookmarklet to find the CSS selector for the log (.brief). When I pass "h3" through brief.at_css I get all of the h3 tags with the content inside.

However, if I add the .text method to remove the tags, I get NoMethod error.

Is there any reason why this is happening? What am I missing? Thanks!

like image 279
aboutaaron Avatar asked Aug 22 '11 21:08

aboutaaron


People also ask

What does undefined method for nil NILClass mean?

The Undefined method for nil:NILClass occurs when you attempt to use a formula on a blank datapill. This indicates that the datapill was not provided any value at runtime.

What is an error NoMethodError undefined method for nil NILClass?

Explained. This is a common Ruby error which indicates that the method or attribute for an object you are trying to call on an object has not been defined.

What is undefined method?

What is an undefined method in Ruby? Undefined method call created a NoMethodError. This is a typical Ruby error that indicates that the method or attribute you are attempting to call on an object has not been declared.


2 Answers

To clarify if you look at the structure of the HTML source you will see that the very first occurrence of <div class="brief"> does not have a child h3 tag (it actually only has a child <p> tag).

The Nokogiri Docs say that

at_css(*rules)

Search this node for the first occurrence of CSS rules. Equivalent to css(rules).first See Node#css for more information.

If you call at_css(*rules) the docs states it is equivalent to css(rules).first. When there are items (your .brief class contains a h3) then an Nokogiri::XML::Element object is returned which responds to text, whereas if your .brief does not contain a h3 then a NilClass object is returned, which of course does not respond to text

So if we call css(rules) (not at_css as you have) we get a Nokogiri::XML::NodeSet object returned, which has the text() method defined as (notice the alias)

# Get the inner text of all contained Node objects
  def inner_text
    collect{|j| j.inner_text}.join('')
  end
  alias :text :inner_text

because the class is Enumerable it iterates over it's children calling their inner_text method and joins them all together.

Therefore you can either perform a nil? check or as @floatless correctly stated just use the css method

like image 103
Paul.s Avatar answered Oct 04 '22 02:10

Paul.s


You just need to replace at_css with css and everything should be okay.

like image 44
Daniel O'Hara Avatar answered Oct 04 '22 02:10

Daniel O'Hara