Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make empty tags self-closing with Nokogiri?

Tags:

xml

ruby

nokogiri

I've created an XML template in ERB. I fill it in with data from a database during an export process.

In some cases, there is a null value, in which case an element may be empty, like this:

<someitem>

</someitem>

In that case, the client receiving the export wants it to be converted into a self-closing tag:

<someitem/>

I'm trying to see how to get Nokogiri to do this, but I don't see it yet. Does anybody know how to make empty XML tags self-closing with Nokogiri?

Update

A regex was sufficient to do what I specified above, but the client now also wants tags whose children are all empty to be self-closing. So this:

<someitem>
  <subitem>

  </subitem>
  <subitem>

  </subitem>
</someitem>

... should also be

<someitem/>

I think that this will require using Nokogiri.

like image 664
Nathan Long Avatar asked Mar 28 '11 13:03

Nathan Long


2 Answers

Search for

<([^>]+)>\s*</\1>

and replace with

<\1/>

In Ruby:

result = subject.gsub(/<([^>]+)>\s*<\/\1>/, '<\1/>')

Explanation:

<       # Match opening bracket
(       # Match and remember...
 [^>]+  # One or more characters except >
)       # End of capturing group
>       # Match closing bracket
\s*     # Match optional whitespace & newlines
<       # Match opening bracket
/       # Match /
\1      # Match the contents of the opening tag
>       # Match closing bracket
like image 107
Tim Pietzcker Avatar answered Sep 28 '22 10:09

Tim Pietzcker


A couple questions:

  1. <foo></foo> is the same as <foo />, so why worry about such a tiny detail? If it is syntactically significant because the text node between the two is a "\n", then put a test in your ERB template that checks for the value that would go there, and if it's not initialized output the self-closing tag instead? See "Yak shaving".
  2. Why involve Nokogiri? You should be able to generate correct XML in ERB since you're in control of the template.

EDIT - Nokogiri's behavior is to not-rewrite parsed XML unless it has to. I suspect you'd have to remove the node in question, then reinsert it as an empty node to get Nokogiri to output what you want.

like image 39
the Tin Man Avatar answered Sep 28 '22 10:09

the Tin Man