Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MarkLogic cts:element-query false positives?

Given this document :-

<items>
  <item><type>T1</type><value>V1</value></item>
  <item><type>T2</type><value>V2</value></item>
</items>

unsurprisingly, I find that this will pull back the page in a cts:uris() :-

cts:and-query((
  cts:element-query(xs:QName('item'),
    cts:element-value-query(xs:QName('type'),'T1')
    ),
  cts:element-query(xs:QName('item'),
    cts:element-value-query(xs:QName('value'),'V2')
    )
  ))

but somewhat surprisingly (to me at least) I also find that this will too :-

cts:element-query(xs:QName('item'),
  cts:and-query((
    cts:element-value-query(xs:QName('type'),'T1'),
    cts:element-value-query(xs:QName('value'),'V2')
    ))
  )

This doesn't seem right, as there is no single item with type=T1 and value=V2. To me this seems like a false positive.

Have I misunderstood how cts:element-query works? (I have to say that the documentation isn't particularly clear in this area).

Or is this something where MarkLogic strives to give me the result I expect, and had I had more or better indexes in place, I would be less likely to get a false positive match.

like image 921
Andy Key Avatar asked May 23 '16 18:05

Andy Key


2 Answers

In addition to the answer by @wst, you only need to enable element value positions to get accurate results from unfiltered search. Here some code to show this:

xdmp:document-insert("/items.xml", <items>
  <item><type>T1</type><value>V1</value></item>
  <item><type>T2</type><value>V2</value></item>
</items>);

cts:search(collection(),
  cts:element-query(xs:QName('item'),
    cts:and-query((
      cts:element-value-query(xs:QName('type'),'T1'),
      cts:element-value-query(xs:QName('value'),'V2')
    ))
  ), 'unfiltered'
)

Without element value positions enabled this returns the test document. After enabling the positions, the query returns nothing.

As said by @wst, cts:search() runs filtered by default, whereas cts:uris() (and for instance xdmp:estimate() only runs unfiltered.

HTH!

like image 152
grtjn Avatar answered Nov 15 '22 20:11

grtjn


Yes, I think this is a slight misunderstanding of how queries work. In cts:search, the default behavior is to enable the filtered option. In this case ML will evaluate the query using only indexes, and then once candidate documents have been selected, it will load them into memory, inspect, and filter out false positives. This is more time consuming, but more accurate.

cts:uris is a lexicon function, so queries passed to it will only resolve via indexes, and there is no option to filter false positives.

The simple way to handle this query via indexes would be to change your schema such that documents are based on <item> instead of <items>. Then each item would have a separate index entry, and results would not be commingled before filtering.

Another way that doesn't involve updating documents is to wrap the queries you expect to occur in the same element in a cts:near-query. That would prevent a <type> in one <item> from matching with a <value> in a different <item>. I suggest reading the documentation because you may need to enable one or more position-based indexes for cts:near-query to be accurate.

like image 4
wst Avatar answered Nov 15 '22 19:11

wst