This is what I want to do:
Remove "span" nodes with a class of "none".
Remove "extra" nodes but keep the text inside them.
Remove any "br" nodes and replace them with "p" nodes
<p class="normal">
<span class="none">
<extra>Some text goes here</extra>
</span>
<span class="none">
<br/>
</span>
<span class="none">
<extra>Some other text goes here</extra>
<br/>
</span>
</p>
This is the output I'd like to achieve:
<p class="normal">Some text goes here</p>
<p class="normal">Some other text goes here</p>
I've tried this so far:
doc.xpath('html/body/p/span').each do |span|
span.attribute_nodes.each do |a|
if a.value == "none"
span.children.each do |child|
span.parent << child
end
span.remove
end
end
end
But this is the output I'm getting, it's not even in the right order:
<p class="normal"><br /><br />Some text goes hereSome other text goes here</p>
Try this out
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML(DATA)
doc.css("span.none, extra").each do |span|
span.swap(span.children)
end
# via http://stackoverflow.com/questions/8937846/how-do-i-wrap-html-untagged-text-with-p-tag-using-nokogiri
doc.search("//br/preceding-sibling::text()|//br/following-sibling::text()").each do |node|
if node.content !~ /\A\s*\Z/
node.replace(doc.create_element('p', node))
end
end
doc.css('br').remove
puts doc
__END__
<p class="normal">
<span class="none">
<extra>Some text goes here</extra>
</span>
<span class="none">
<br/>
</span>
<span class="none">
<extra>Some other text goes here</extra>
<br/>
</span>
</p>
Which prints
<?xml version="1.0"?>
<p class="normal">
<p>Some text goes here</p>
<p>Some other text goes here</p>
</p>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With