Given an HTML document in Nokogiri, I want to remove all <p>
nodes with no actual text. This includes <p>
elements with whitespace and/or <br/>
tags. What's the most elegant way to do this?
This is a simpler fix: it removes both the whitespace and the br
tags.
given the HTML
"<p> </p><p>Foo<p/><p><br/> <br> </p>"
Solution:
document.css('p').find_all.each do |p|
# Ruby on Rails Solution:
p.remove if p.content.blank?
# Ruby solution, as pointed out by Michael Hartl:
p.remove if p.content.strip.empty?
end
# document => <p>Foo</p>
I would start with a method like this one (feel free to monkeypatch Nokogiri::XML::Node
if you want to)
def is_blank?(node)
(node.text? && node.content.strip == '') || (node.element? && node.name == 'br')
end
Then continue with another method that checks that all children are blank:
def all_children_are_blank?(node)
node.children.all?{|child| is_blank?(child) }
# Here you see the convenience of monkeypatching... sometimes.
end
And finally, get the document
and
document.css('p').find_all{|p| all_children_are_blank?(p) }.each do |p|
p.remove
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With