Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting HTML5 data attributes from a tag

I want to extract all the HTML5 data attributes from a tag, just like this jQuery plugin.

For example, given:

<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>

I want to get a hash like:

{ 'data-age' => '50', 'data-location' => 'London' }

I was originally hoping use a wildcard as part of my CSS selector, e.g.

Nokogiri(html).css('span[@data-*]').size

but it seems that isn't supported.

like image 258
Andy Waite Avatar asked Mar 17 '12 22:03

Andy Waite


1 Answers

Option 1: Grab all data elements

If all you need is to list all the page's data elements, here's a one-liner:

Hash[doc.xpath("//span/@*[starts-with(name(), 'data-')]").map{|e| [e.name,e.value]}]

Output:

{"data-age"=>"50", "data-location"=>"London"}

Option 2: Group results by tag

If you want to group your results by tag (perhaps you need to do additional processing on each tag), you can do the following:

tags = []
datasets = "@*[starts-with(name(), 'data-')]"

#If you want any element, replace "span" with "*"
doc.xpath("//span[#{datasets}]").each do |tag|
    tags << Hash[tag.xpath(datasets).map{|a| [a.name,a.value]}]
end

Then tags is an array containing key-value hash pairs, grouped by tag.

Option 3: Behavior like the jQuery datasets plugin

If you'd prefer the plugin-like approach, the following will give you a dataset method on every Nokogiri node.

module Nokogiri
  module XML
    class Node
      def dataset
        Hash[self.xpath("@*[starts-with(name(), 'data-')]").map{|a| [a.name,a.value]}]
      end
    end
  end
end

Then you can find the dataset for a single element:

doc.at_css("span").dataset

Or get the dataset for a group of elements:

doc.css("span").map(&:dataset)

Example:

The following is the behavior of the dataset method above. Given the following lines in the HTML:

<span data-age="50" data-location="London" class="highlight">Joe Bloggs</span>
<span data-age="40" data-location="Oxford" class="highlight">Jim Foggs</span>

The output would be:

[
 {"data-location"=>"London", "data-age"=>"50"},
 {"data-location"=>"Oxford", "data-age"=>"40"}
]
like image 87
Mark Thomas Avatar answered Oct 07 '22 18:10

Mark Thomas