I know that the function count can be used to count the number of elements in a given sequence, like this:
count(result/actors/actor)
in this XML:
<result>
<actors>
<actor id="00000015">Anderson, Jeff</actor>
<actor id="00000030">Bishop, Kevin</actor>
<actor id="0000000f">Bonet, Lisa</actor>
<actor id="916503207">Parillaud, Anne</actor>
<actor id="916503208">Pitt, Brad</actor>
<actor id="916503209">Freeman, Morgan</actor>
<actor id="916503211">Domingo, Placido</actor>
<actor id="916503210">Sharif, Omar</actor>
<actor id="1337">Doqumenteriet2011</actor>
</actors>
</result>
But what if I want to know how many times a value occurs in a given sequence?
For example if I would like to know how many movies each actor (actorRef) appeared in in the following XML:
<videos>
<video id="id1235AA0">
<title>The Fugitive</title>
<actorRef>00000003</actorRef>
<actorRef>00000006</actorRef>
</video>
<video id="id1244100">
<title>Enemy of the State</title>
<actorRef>00000009</actorRef>
<actorRef>0000000c</actorRef>
<actorRef>0000000f</actorRef>
<actorRef>00000012</actorRef>
</video>
<video id="id124E230">
<title>Clerks</title>
<actorRef>00000015</actorRef>
<actorRef>00000018</actorRef>
<actorRef>0000001b</actorRef>
</video>
I can easily produce a list of all the appearing actors, and even have them appear as many times in my produced sequence as in the XML:
result/videos//actorRef
but I am not able to do anything similar to what for example COUNT() and GROUP BY do together in SQL to get a list of the actors by count of their multiplicity in the sequence produced by the above line of XQuery.
How can I produce this list?
PS: The end goal is to find the actors that appeared the most movies.
Here is a pure XPath 2.0 expression (XPath 2.0 is a proper subset of XQuery), that produces the sequence of actorRef
values identifying the actors that participated in maximum number of movies:
for $maxMovies in
max(for $actorId in distinct-values(/*/*/actorRef)
return
count(index-of(/*/*/actorRef, $actorId))
)
return
(/*/*/actorRef)[index-of(/*/*/actorRef, .)[$maxMovies]]/string()
When this expression is evaluated on the following source XML document:
<videos>
<video id="id1235AA0">
<title>The Fugitive</title>
<actorRef>00000003</actorRef>
<actorRef>00000009</actorRef>
<actorRef>0000000x</actorRef>
</video>
<video id="id1244100">
<title>Enemy of the State</title>
<actorRef>00000009</actorRef>
<actorRef>0000000c</actorRef>
<actorRef>0000000f</actorRef>
<actorRef>00000012</actorRef>
</video>
<video id="id124E230">
<title>Clerks</title>
<actorRef>00000015</actorRef>
<actorRef>00000018</actorRef>
<actorRef>0000001b</actorRef>
</video>
</videos>
The correct, wanted result is produced:
00000009
Using XPath 3.0 (proper subset of XQuery 3.0) one can even write this quite shorter:
let $vSeq := /*/*/actorRef/string()
return
for $maxMovies in
max(for $actorId in distinct-values($vSeq)
return
index-of($vSeq, $actorId) ! last()
)
return
$vSeq[index-of($vSeq, .)[$maxMovies]]
And this can be shortened even further using the simple mapping operator (!
) to avoid any for-expression
:
let $vSeq := /*/*/actorRef/string(),
$maxOccurs :=
max(distinct-values($vSeq) ! count(index-of($vSeq, .)) )
return
$vSeq[index-of($vSeq, .)[$maxOccurs]]
This is the kind of question that isn't good for a document store when you are just storing the list of actors in videos. I'd suggest also storing the lists of videos that an actor is part of. Then you'd just have to query for the actor that has the most videos elements.
All that said, you can do it with the data you have it just isn't going to be all that fast. You first need to get a distance list of actors. Then query for each actor filter the videos that have that actor and do a count. and then order by count.
let $actors := fn:distinct-values($results/videos/video/actorRef)
for $actor in $actors
let $count := fn:count($results/videos/video[actorRef = $actor])
Order by $count
return ($actor, $count)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With