I have some code that uses Nokogiri and I am trying to get the inner_html
without getting the comments.
html = Nokogiri::HTML(open(@sql_scripts_url[1])) #using first value of the array
html.css('td[class="ms-formbody"]').each do |node|
puts node.inner_html # prints comments
end
Since you have not provided any sample HTML or desired output, here's a general solution:
You can select SGML comments in XPath by using the comment()
node test; you can strip them out of the document by calling .remove
on all comment nodes. Illustrated:
require 'nokogiri'
doc = Nokogiri.XML('<r><b>hello</b> <!-- foo --> world</r>')
p doc.inner_html #=> "<b>hello</b> <!-- foo --> world"
doc.xpath('//comment()').remove
p doc.inner_html #=> "<b>hello</b> world"
Note that the above modifies the document destructively to remove the comments. If you wish to keep the original document unmodified, you could alternatively do this:
class Nokogiri::XML::Node
def inner_html_reject(xpath='.//comment()')
dup.tap{ |shadow| shadow.xpath(xpath).remove }.inner_html
end
end
doc = Nokogiri.XML('<r><b>hello</b> <!-- foo --> world</r>')
p doc.inner_html_reject #=> "<r><b>hello</b> world</r>"
p doc.inner_html #=> "<r><b>hello</b> <!-- foo --> world</r>"
Finally, note that if you don't need the markup, just asking for the text
itself does not include HTML comments:
p doc.text #=> "hello world"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With