Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath with recursive definitions

Tags:

xml

xpath

dtd

I have a DTD like this :

     <!ELEMENT Root (Thread*) >
     <!ELEMENT Thread(ThreadId, Message) >
    <!ELEMENT Replies(message+) >
     <!ELEMENT message(timestamp, sender, recipient, subject, text, Replies?)>

So a thread will have a message and this message can have a node 'replies', then this node can contain messages and so on until the bottom of the structure.

Now what I want to do is to first retrieve the ID of the thread with the most messages and then retrieve the ID of the thread with the longest chain of nested replies.

It feels like a recursive problem but I'm not able to approach it in XPath. So far I tried something like this :

      For $thread in //thread
      Count(descendant-or-self::$thread/message) 

For each thread I Try to count the number of children messages nodes, but this solution counts the number of All the children nodes of the thread, therefore including Replies nodes.

I'm feeling lost with this kind of problems as I cannot figure out what to do in these 'recursive situations'.

like image 683
Pagli Avatar asked Apr 11 '26 22:04

Pagli


1 Answers

Assuming XPath 3.0 you can use e.g.

let $max := max(/Root/Thread/count(.//Message))
return /Root/Thread[count(.//Message) eq $max]/ThreadId

to find the id(s) of the thread(s) with most messages and I think

let $max := max(/Root/Thread/Message//Replies[not(Message/Replies)]/count(ancestor::Replies))
return /Root/Thread[Message//Replies[not(Message/Replies)]/count(ancestor::Replies) = $max]/ThreadId

to find the id(s) of the thread(s) with the longest chain of nested replies.

With XPath 2.0 you don't have let expressions so you would have to inline the code bound in my samples to the variable in the place where the variable is referenced.

In XPath 3.1 you have a sort function (https://www.w3.org/TR/xpath-functions-31/#func-sort) so instead of computing the maximum and selecting the items with the maximum you could sort and take the last e.g.

sort(/Root/Thread, (), function($t) { max($t/Message//Replies[not(Message/Replies)]/count(ancestor::Replies)) })[last()]/ThreadId

for the second, more complex query or

sort(/Root/Thread, (), function($t) { count($t//Message) })[last()]/ThreadId

for the first one.

like image 80
Martin Honnen Avatar answered Apr 16 '26 14:04

Martin Honnen