Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XmlSlurper - list text and regular nodes of xhtml document

I am using Groovy's XmlSlurper to parse xhtml document (or sudo xhthml one), and I'm trying to get to the text nodes of the document but can't figure how, here is the code:

import groovy.util.*

xmlText = '''
<TEXTFORMAT INDENT="10" LEADING="-5">
  <P ALIGN="LEFT">
    <FONT FACE="Garamond Premr Pro" SIZE="20" COLOR="#001200" LETTERSPACING="0" KERNING="0">
      Less is more! this 
      <FONT COLOR="#FFFF00">should be all</FONT>
      the 
      <FONT COLOR="#00FF00"> words OR should some </FONT>
      OTHER WORDS will be there?
    </FONT>
  </P>
</TEXTFORMAT>
'''
records = new XmlSlurper().parseText(xmlText)
records.P.FONT.children().eachWithIndex {it, index -> println "${index} - ${it}"} 

Which print the following output:

0 - should be all 
1 -  words OR should some

But I want it to print the text nodes content as well so the desired output is:

0 - Less is more! this
1 - should be all
2 - the 
3 - words OR should some
4 - OTHER WORDS will be there?

Any ideas?

like image 743
talg Avatar asked Nov 18 '25 02:11

talg


1 Answers

Looks like XmlSlurper does NOT have separate method to retrieve "Mixed Content"

There is an open item to add method supporting Mixed Content here -> Groovy JIRA

like image 144
Kartik Shah Avatar answered Nov 19 '25 20:11

Kartik Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!