Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A better way to remove blank lines after Nokogiri Node removal

Tags:

xml

ruby

nokogiri

Perhaps this is nitpicky, but I have to ask.

I'm using Nokogiri to parse XML, remove certain tags, and write over the original file with the results. Using .remove leaves blank lines in the XML. I'm currently using a regex to get rid of the blank lines. Is there some built-in Nokogiri method I should be using?

Here's what I have:

require 'Nokogiri'
io_path = "/path/to/metadata.xml"
io = File.read(io_path)
document = Nokogiri::XML(io)
document.xpath('//artwork_files', '//tracks', '//previews').remove

# write to file and remove blank lines with a regular expression
File.open(io_path, 'w') do |x|
  x << document.to_s.gsub(/\n\s+\n/, "\n")
end
like image 335
michaelmichael Avatar asked Nov 24 '09 20:11

michaelmichael


3 Answers

There is not built-in methods, but we can add one

class Nokogiri::XML::Document
  def remove_empty_lines!
    self.xpath("//text()").each { |text| text.content = text.content.gsub(/\n(\s*\n)+/,"\n") }; self
  end
end
like image 157
akuhn Avatar answered Nov 09 '22 22:11

akuhn


This removed blank lines for me;

doc.xpath('//text()').find_all {|t| t.to_s.strip == ''}.map(&:remove)
like image 21
digitalronin Avatar answered Nov 09 '22 23:11

digitalronin


Doing a substitution on each text node didn't work for me either. The problem is that after removing nodes, text nodes that just became adjacent don't get merged. When you loop over text nodes, each one has only a single newline, but there are now several of them in a row.

One rather messy solution I found was to reparse the document:

xml = Nokogiri::XML.parse xml.to_xml

Now adjacent text nodes will be merged and you can do regexes on them.

But this looks like it's probably a better option:

https://github.com/tobym/nokogiri-pretty

like image 33
Mike Ciul Avatar answered Nov 09 '22 22:11

Mike Ciul