Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a nokogiri case insensitive Xpath selector?

I'm using nokogiri to select the 'keywords' attribute like this:

puts page.parser.xpath("//meta[@name='keywords']").to_html

One of the pages I'm working with has the keywords label with a capital "K" which has motivated me to make the query case insensitive.

<meta name="keywords"> AND <meta name="Keywords"> 

So, my question is: What is the best way to make a nokogiri selection case insensitive?

EDIT Tomalak's suggestion below works great for this specific problem. I'd like to also use this example to help understand nokogiri better though and have a couple issues that I'm wondering about and have not been successful searching for. For example, are the regex 'pseudo classes' Nokogiri Docs appropriate for a problem like this?

I'm also curious about the matches?() method in nokogiri. I have not been able to find any clarification on the method. Does it have anything to do with the 'matches' concept in XPath 2.0 (and therefore could it be used to solve this problem)?

Thanks very much.

like image 961
Rick Avatar asked Feb 17 '10 09:02

Rick


2 Answers

Nokogiri allows custom XPath functions. The nokogiri docs that you link to show an inline class definition for when you're only using it once. If you have a lot of custom functions or if you use the case-insensitive match a lot, you may want to define it in a class.

class XpathFunctions

  def case_insensitive_equals(node_set, str_to_match)
    node_set.find_all {|node| node.to_s.downcase == str_to_match.to_s.downcase }
  end

end

Then call it like any other XPath function, passing in an instance of your class as the 2nd argument.

page.parser.xpath("//meta[case_insensitive_equals(@name,'keywords')]",
                  XpathFunctions.new).to_html

In your Ruby method, node_set will be bound to a Nokogiri::XML::NodeSet. In the case where you're passing in an attribute value like @name, it will be a NodeSet with a single Nokogiri::XML::Attr. So calling to_s on it gives you its value. (Alternatively, you could use node.value.)

Unlike using XPath translate where you have to specify every character, this works on all the characters and character encodings that Ruby works on.

Also, if you're interested in doing other things besides case-insensitive matching that XPath 1.0 doesn't support, it's just Ruby at this point. So this is a good starting point.

like image 111
Jonathan Tran Avatar answered Nov 03 '22 22:11

Jonathan Tran


Wrapped for legibility:

puts page.parser.xpath("
  //meta[
    translate(
      @name, 
      'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 
      'abcdefghijklmnopqrstuvwxyz'
    ) = 'keywords'
  ]
").to_html

There is no "to lower case" function in XPath 1.0, so you have to use translate() for this kind of thing. Add accented letters as necessary.

like image 9
Tomalak Avatar answered Nov 03 '22 22:11

Tomalak