Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two elements of the same document in MarkLogic

I have a MarkLogic 8 database in which there are documents which have two date time fields:

  1. created-on
  2. active-since

I am trying to write an Xquery to search all the documents for which the value of active-since is less than the value of created-on

Currently I am using the following FLWOR exression:

    for $entity in fn:collection("entities")

    let $id := fn:data($entity//id)
    let $created-on := fn:data($entity//created-on)
    let $active-since := fn:data($entity//active-since)

    where $active-since < $created-on

    return 
      (
        $id,
        $created-on,
        $active-since
      )

The above query takes too long to execute and with increase in the number of documents the execution time of this query will also increase.

Also, I have element-range-index for both the above mentioned dateTime fields but they are not getting used here. The cts-element-query function only compares one element with a set of atomic values. In my case I am trying to compare two elements of the same document.

I think there should be a better and optimized solution for this problem.

Please let me know in case there is any search function or any other approach which will be suitable in this scenario.

like image 390
Rahul Avatar asked Dec 18 '22 16:12

Rahul


2 Answers

This may be efficient enough for you.

Take one of the values and build a range query per value. This all uses the range indexes, so in that sense, it is efficient. However, at some point, there is a large query that us built. It reads similiar to a flword statement. If really wanted to be a bit more efficient, you could find out which if your elements had less unique values (size of the index) and use that for your iteration - thus building a smaller query. Also, you will note that on the element-values call, I also constrain it to your collection. This is just in case you happen to have that element in documents outside of your collection. This keeps the list to only those values you know are in your collection:

let $q := cts:or-query(
        for $created-on in cts:element-values(xs:QName("created-on"), (), cts:collection-query("entities"))
            return cts:element-value-range-query(xs:Qname("active-since"), "<" $created-on)
    )


return
    cts:search(
        fn:collection("entities"),
        $q
    )

So, lets explain what is happening in a simple example:
Lets say I have elements A and B - each with a range index defined.

Lets pretend we have the combinations like this in 5 documents:
A,B
2,3
4,2
2,7
5,4
2,9

let $ := cts:or-query(
        for $a in cts:element-values(xs:QName("A"))
            return cts:element-value-range-query(xs:Qname("B"), "<" $a)
    )

This would create the following query:

cts:or-query(
    (
       cts:element-value-range-query(xs:Qname("B"), "<" 2),
       cts:element-value-range-query(xs:Qname("B"), "<" 4),
       cts:element-value-range-query(xs:Qname("B"), "<" 5)
    )
 )

And in the example above, the only match would be the document with the combination: (5,4)

like image 142
David Ennis __17llamas __ Avatar answered Feb 12 '23 14:02

David Ennis __17llamas __


You might try using cts:tuple-values(). Pass in three references: active-since, created-on, and the URI reference. Then iterate the results looking for ones where active-since is less than created-on, and you'll have the URI of the doc.

It's not the prettiest code, but it will let all the data come from RAM, so it should scale nicely.

like image 34
hunterhacker Avatar answered Feb 12 '23 14:02

hunterhacker