I'm a newbie to programmer so excuse my noviceness. So I'm using Nokogiri to scrape a police crime log. Here is the code below:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.sfsu.edu/~upd/crimelog/index.html"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".brief").each do |brief|
puts brief.at_css("h3").text
end
I used the selector gadget bookmarklet to find the CSS selector for the log (.brief). When I pass "h3" through brief.at_css I get all of the h3 tags with the content inside.
However, if I add the .text method to remove the tags, I get NoMethod error.
Is there any reason why this is happening? What am I missing? Thanks!
The Undefined method for nil:NILClass occurs when you attempt to use a formula on a blank datapill. This indicates that the datapill was not provided any value at runtime.
Explained. This is a common Ruby error which indicates that the method or attribute for an object you are trying to call on an object has not been defined.
What is an undefined method in Ruby? Undefined method call created a NoMethodError. This is a typical Ruby error that indicates that the method or attribute you are attempting to call on an object has not been declared.
To clarify if you look at the structure of the HTML source you will see that the very first occurrence of <div class="brief">
does not have a child h3
tag (it actually only has a child <p>
tag).
The Nokogiri Docs say that
at_css(*rules)
Search this node for the first occurrence of CSS rules. Equivalent to css(rules).first See Node#css for more information.
If you call at_css(*rules)
the docs states it is equivalent to css(rules).first
. When there are items (your .brief
class contains a h3
) then an Nokogiri::XML::Element
object is returned which responds to text
, whereas if your .brief
does not contain a h3
then a NilClass
object is returned, which of course does not respond to text
So if we call css(rules)
(not at_css
as you have) we get a Nokogiri::XML::NodeSet
object returned, which has the text()
method defined as (notice the alias
)
# Get the inner text of all contained Node objects
def inner_text
collect{|j| j.inner_text}.join('')
end
alias :text :inner_text
because the class is Enumerable
it iterates over it's children calling their inner_text
method and joins them all together.
Therefore you can either perform a nil?
check or as @floatless correctly stated just use the css
method
You just need to replace at_css
with css
and everything should be okay.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With