Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java contextual sax / stax parsing

I would like to extract all text elements which appear directly as a child node to the root node. I've had a glance at java standard sax fascilities using DefaultHandler; but it doesn't seem like it's path aware.

The problem is getting first-level only nodes, not extracting only text-nodes.

Is there any non-DOM oriented approach to do this? (Note, the node names are not known in advance)

[EDIT]

Sample input

<root>
   <a>text1</a>
   <b>text2</b>
   <c>text3</c>
   <nested>
       <d>not_text4</d>
       ...
   <nested>
   ...
</root>

Sample output

Map<String, String> map := {
    {a, text1}
    {b, text2}
    {c, text3}
}

Currently solved as a DOM oriented workaround. Although there exist libraries which offers a subset of xpath expressions for SAX / STAX.

like image 374
Johan Sjöberg Avatar asked Nov 28 '25 15:11

Johan Sjöberg


1 Answers

SAX and StAX indeed aren't path aware by nature as they're event oriented. While it's certainly possible to implement a handler that tracks parsing level, you're probably better off with XPath.

A somewhat more complex tactic might be to write an XSLT transform that retains only the elements you're after and then process the result using SAX or Stax.

like image 121
Don Roby Avatar answered Dec 01 '25 06:12

Don Roby



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!