Given a phrase match query like this:
{
'match_phrase': {
'text.english': {
'query': "The fox jumped over the wall",
'phrase_slop': 4,
}
}
}
Is there a way I can group results by the exact match?
So if I have 1 document with text.english
containing "The quick fox jumps over the small wall" and 3 documents containing "The lazy fox jumped over the big wall", I end up with those two groups of results.
I'm OK with running multiple queries and doing some processing outside of ES, but I need a solution that performs reasonably over a large set of documents. Ideally I'm hoping there's a way to do this using aggregations that I've missed.
The best solution I've come up with is to run the query above with highlights, parse out all of the highlights from all of the results, and group them based on highlight content. This is fine for very small result sets, however over a 1000+ document result set it is prohibitively slow.
EDIT: Maybe I can make this a bit clearer. If I have sample documents with the following values:
I want to be able to group my results as follows with query text "The fox jumped over the wall":
In my opinion, highlighting
is the only option because it's the only way Elasticsearch will show which "parts" of text matched. And in your case, you want to group documents based on what "matched.
If the text would have been shorter (like few words), maybe a more involved solution would have been to split the text in a shingle
-kind of way and somehow group on those phrases... maybe.
But for pages of text, I think the only option is to use highlighting and perform additional steps afterwards to group the highlighted parts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With