Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Marklogic date comparison in XQuery with or without index

Tags:

marklogic

I need to filter documents by date (last week, last month, etc.) with Marklogic 8. The database contains 1.3 million XML documents.

The documents look like this:

<work datum_gegenereerd="2015-06-10" gegenereerd="2015-06-10T14:28:48" label="gmb-2015-12000">
 ...

I've created a range element attribute index on work/@datum_gegenereerd (scalar type date).

The following query works but is slow (3 seconds):

xquery version "1.0-ml";
for $a in //work
where xs:date($a/@datum_gegenereerd) > current-date()-   5*xs:dayTimeDuration('P1D')
return
<hit>{base-uri($a)}</hit>

After a lot of experimenting, it turns out that I can get the performance down to 0.02 seconds by removing the xs:date cast from the where statement.

xquery version "1.0-ml";
for $a in //work
where $a/@datum_gegenereerd > current-date()-   5*xs:dayTimeDuration('P1D')
return
<hit>{base-uri($a)}</hit>

Can anyone explain this behaviour?


Update:
when I delete the attribute range index, the performance for the second variant goes down to 3+ seconds as well. And recreating the index brings the performance back up. This makes me wonder how to read David's statement below that there is no way to use a custom index from plain xquery. (BTW: the query returns 1267 XML documents, out of a possible 450000 documents with root element work in a total database of 1.35 million documents)
Update 2:
I messed up with the performance metric of 0.02 seconds. But it is very fast in the query console. Of the 3 versions, the cts-search seems a tiny bit faster.

like image 693
M_breeb Avatar asked Oct 29 '25 00:10

M_breeb


1 Answers

You may have created an index, but you are not using it. You need to use an element-attribute-range-query to find all of the fragments that have dates in the range in question.

something like

cts:search(doc(), cts:element-attribute-range-query(xs:QName("work"), xs:QName("datum_gegenereerd"), ">" current-date()-   5*xs:dayTimeDuration('P1D'))

BUT: if you really just want the URIS, then the element-range-query would be used with cts:uris (sometihng like this - but check the docs)

cts:uris('', (), cts:element-attribute-range-query(xs:QName("work"), xs:QName("datum_gegenereerd"), ">" current-date()-   5*xs:dayTimeDuration('P1D'))

The second one does everything in memory and just pulls the URIs from the URI lexicon that point to document fragments where the date query matches.

like image 87
David Ennis __17llamas __ Avatar answered Nov 01 '25 13:11

David Ennis __17llamas __



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!