Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XQuery: How to count how many times a value occurs in sequence

I know that the function count can be used to count the number of elements in a given sequence, like this:

count(result/actors/actor)

in this XML:

<result>
    <actors>
        <actor id="00000015">Anderson, Jeff</actor>
        <actor id="00000030">Bishop, Kevin</actor>
        <actor id="0000000f">Bonet, Lisa</actor>
        <actor id="916503207">Parillaud, Anne</actor>
        <actor id="916503208">Pitt, Brad</actor>
        <actor id="916503209">Freeman, Morgan</actor>
        <actor id="916503211">Domingo, Placido</actor>
        <actor id="916503210">Sharif, Omar</actor>
        <actor id="1337">Doqumenteriet2011</actor>
    </actors>
</result>

But what if I want to know how many times a value occurs in a given sequence?

For example if I would like to know how many movies each actor (actorRef) appeared in in the following XML:

<videos>
    <video id="id1235AA0">
        <title>The Fugitive</title>
        <actorRef>00000003</actorRef>
        <actorRef>00000006</actorRef>
    </video>
    <video id="id1244100">
        <title>Enemy of the State</title>
        <actorRef>00000009</actorRef>
        <actorRef>0000000c</actorRef>
        <actorRef>0000000f</actorRef>
        <actorRef>00000012</actorRef>
    </video>
    <video id="id124E230">
        <title>Clerks</title>
        <actorRef>00000015</actorRef>
        <actorRef>00000018</actorRef>
        <actorRef>0000001b</actorRef>
    </video>

I can easily produce a list of all the appearing actors, and even have them appear as many times in my produced sequence as in the XML:

result/videos//actorRef

but I am not able to do anything similar to what for example COUNT() and GROUP BY do together in SQL to get a list of the actors by count of their multiplicity in the sequence produced by the above line of XQuery.

How can I produce this list?

PS: The end goal is to find the actors that appeared the most movies.

like image 434
Sahand Avatar asked Oct 03 '15 17:10

Sahand


2 Answers

Here is a pure XPath 2.0 expression (XPath 2.0 is a proper subset of XQuery), that produces the sequence of actorRef values identifying the actors that participated in maximum number of movies:

 for $maxMovies in 
       max(for $actorId in distinct-values(/*/*/actorRef) 
            return
               count(index-of(/*/*/actorRef, $actorId))
           )
    return 
      (/*/*/actorRef)[index-of(/*/*/actorRef, .)[$maxMovies]]/string()

When this expression is evaluated on the following source XML document:

<videos>
    <video id="id1235AA0">
        <title>The Fugitive</title>
        <actorRef>00000003</actorRef>
        <actorRef>00000009</actorRef>
        <actorRef>0000000x</actorRef>
    </video>
    <video id="id1244100">
        <title>Enemy of the State</title>
        <actorRef>00000009</actorRef>
        <actorRef>0000000c</actorRef>
        <actorRef>0000000f</actorRef>
        <actorRef>00000012</actorRef>
    </video>
    <video id="id124E230">
        <title>Clerks</title>
        <actorRef>00000015</actorRef>
        <actorRef>00000018</actorRef>
        <actorRef>0000001b</actorRef>
    </video>
</videos>

The correct, wanted result is produced:

00000009

Using XPath 3.0 (proper subset of XQuery 3.0) one can even write this quite shorter:

let $vSeq := /*/*/actorRef/string()
  return
    for $maxMovies in 
       max(for $actorId in distinct-values($vSeq) 
            return
              index-of($vSeq, $actorId) ! last()
           )
      return 
        $vSeq[index-of($vSeq, .)[$maxMovies]]

And this can be shortened even further using the simple mapping operator (!) to avoid any for-expression:

let $vSeq := /*/*/actorRef/string(),
    $maxOccurs := 
      max(distinct-values($vSeq) ! count(index-of($vSeq, .)) ) 
  return 
    $vSeq[index-of($vSeq, .)[$maxOccurs]]
like image 132
Dimitre Novatchev Avatar answered Nov 18 '22 00:11

Dimitre Novatchev


This is the kind of question that isn't good for a document store when you are just storing the list of actors in videos. I'd suggest also storing the lists of videos that an actor is part of. Then you'd just have to query for the actor that has the most videos elements.

All that said, you can do it with the data you have it just isn't going to be all that fast. You first need to get a distance list of actors. Then query for each actor filter the videos that have that actor and do a count. and then order by count.

let $actors := fn:distinct-values($results/videos/video/actorRef)

for $actor in $actors
let $count := fn:count($results/videos/video[actorRef = $actor])
Order by $count
return ($actor, $count)
like image 37
Tyler Replogle Avatar answered Nov 18 '22 01:11

Tyler Replogle