I'm using the below mentioned query to get distinct values from XML files stored in a collection in MarkLogic. Collection contains more than 40k files.
When the query is executed it takes a long time for the results. Is there any better way to optimize the below query or any other options to use this query without XPath.
Xquery:
fn:distinct-values(fn:collection(collectionName)//caseml/case[@jur eq in]/@year)
Input XML Example:
<?xml version="1.0" encoding="UTF-8"?>
<caseml>
<case jur="in" series="mlj" volume="1" year="2016" startpage="129">
<p num="y" pnum="22">
<text>
In view of the aforesaid discussion, we find the writ petition completely devoid
of any merit and accordingly, we dismiss the same, leaving the parties to bear their
own costs.
</text>
</p>
</case>
</caseml>
The above XQuery is working, but need to get the results faster.
For fast atomic value retrieval across a large set of documents you want to configure a range index, which instructs MarkLogic to extract the values at index time and keep them in a memory-resident data structure so they can be accessed without touching the disk. Since you want the values at a specific path you'll want to configure a path range index. After reindexing you can use cts:values
to retrieve the values. You can optionally pass a cts:query
to the call to restrict things to documents matching some criteria.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With