Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MarkLogic - Get distinct result set without using Xpath

I'm using the below mentioned query to get distinct values from XML files stored in a collection in MarkLogic. Collection contains more than 40k files.

When the query is executed it takes a long time for the results. Is there any better way to optimize the below query or any other options to use this query without XPath.

Xquery:

fn:distinct-values(fn:collection(collectionName)//caseml/case[@jur eq in]/@year)

Input XML Example:

<?xml version="1.0" encoding="UTF-8"?>
<caseml>
  <case jur="in" series="mlj" volume="1" year="2016" startpage="129">
    <p num="y" pnum="22">
      <text>
        In view of the aforesaid discussion, we find the writ petition completely devoid
        of any merit and accordingly, we dismiss the same, leaving the parties to bear their
        own costs.
      </text>
    </p>
  </case>
</caseml>

The above XQuery is working, but need to get the results faster.

like image 927
Sankar Avatar asked Dec 25 '22 03:12

Sankar


1 Answers

For fast atomic value retrieval across a large set of documents you want to configure a range index, which instructs MarkLogic to extract the values at index time and keep them in a memory-resident data structure so they can be accessed without touching the disk. Since you want the values at a specific path you'll want to configure a path range index. After reindexing you can use cts:values to retrieve the values. You can optionally pass a cts:query to the call to restrict things to documents matching some criteria.

like image 129
hunterhacker Avatar answered Jan 01 '23 11:01

hunterhacker