Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to extract attribute values using Nokogiri with custom pseudoclass CSS selectors

Having loaded a (X)HTML page, I'm trying to get the value of a meta tag's "content" attribute. For example, given:

<meta name="author" content="John Smith" />

I'd like to extract the value "John Smith".

I know how to do that using XPath and understand that CSS was meant primarily for element selection but Nokogiri supports defining custom CSS pseudoclasses which I thought could be used as follows:

class CSSext
  def attr(nodeset, tag)
    nodeset.first.attribute_nodes.find_all {|node| node.name == tag}
  end
end

doc = Nokogiri::HTML(open(someurl))
doc.css("meta[name='name']:attr('content')", CSSext.new)

However, this returns the same result as

doc.css("meta[name='name']")

What gives? Nokogiri uses the same engine underneath for both CSS and XPath searches so anything that's possible in XPath should be doable in CSS. How should I go about extracting the attribute value?

like image 812
user1955506 Avatar asked Jan 07 '13 16:01

user1955506


1 Answers

Why not just?

doc.at("meta[name='author']")['content']

As far as I understand, pseudoclasses can be used to filter the nodeset only, but not to replace the nodeset with some other value such as the value of one of the nodes's attribute.

like image 115
akuhn Avatar answered Oct 28 '22 15:10

akuhn