Perhaps this is nitpicky, but I have to ask.
I'm using Nokogiri to parse XML, remove certain tags, and write over the original file with the results. Using .remove
leaves blank lines in the XML. I'm currently using a regex to get rid of the blank lines. Is there some built-in Nokogiri method I should be using?
Here's what I have:
require 'Nokogiri'
io_path = "/path/to/metadata.xml"
io = File.read(io_path)
document = Nokogiri::XML(io)
document.xpath('//artwork_files', '//tracks', '//previews').remove
# write to file and remove blank lines with a regular expression
File.open(io_path, 'w') do |x|
x << document.to_s.gsub(/\n\s+\n/, "\n")
end
There is not built-in methods, but we can add one
class Nokogiri::XML::Document
def remove_empty_lines!
self.xpath("//text()").each { |text| text.content = text.content.gsub(/\n(\s*\n)+/,"\n") }; self
end
end
This removed blank lines for me;
doc.xpath('//text()').find_all {|t| t.to_s.strip == ''}.map(&:remove)
Doing a substitution on each text node didn't work for me either. The problem is that after removing nodes, text nodes that just became adjacent don't get merged. When you loop over text nodes, each one has only a single newline, but there are now several of them in a row.
One rather messy solution I found was to reparse the document:
xml = Nokogiri::XML.parse xml.to_xml
Now adjacent text nodes will be merged and you can do regexes on them.
But this looks like it's probably a better option:
https://github.com/tobym/nokogiri-pretty
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With