Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine completion, suggestion and match phrase across multiple text fields?

I've been reading about Elasticsearch suggesters, match phrase prefix and highlighting and i'm a bit confused as to which to use to suit my problem.

Requirement: i have a bunch of different text fields, and need to be able to autocomplete and autosuggest across all of them, as well as misspelling. Basically the way Google works.

See in the following Google snapshot, when we start typing "Can", it lists word like Canadian, Canada, etc. This is auto complete. However it lists additional words also like tire, post, post tracking, coronavirus etc. This is auto suggest. It searches for most relevant word in all fields. If we type "canxad" it should also misspel suggest the same results.

enter image description here

Could someone please give me some hints on how i can implement the above functionality across a bunch of text fields?

At first i tried this:

GET /myindex/_search
{
  "query": {
    "match_phrase_prefix": {
      "myFieldThatIsCombinedViaCopyTo": "revis"
    }
  },
  "highlight": {
    "fields": {
      "*": {}
    },
    "require_field_match" : false
  }
}

but it returns highlights like this:

"In the aforesaid revision filed by the members of the Committee, the present revisionist was also party",

So that's not a "prefix" anymore...

Also tried this:

GET /myindex/_search
{
  "query": {
    "multi_match": {
      "query": "revis",
      "fields": ["myFieldThatIsCombinedViaCopyTo"],
      "type": "phrase_prefix",
      "operator": "and"
    }
  },
  "highlight": {
    "fields": {
      "*": {}
    }
  }
}

But it still returns

"In the aforesaid revision filed by the members of the Committee, the present revisionist was also party",

Note: I have about 5 "text" fields that I need to search upon. One of those fields is quite long (1000s of words). If I break things up into keywords, I lose the phrase. So it's like I need match phrase prefix across a combined text field, with fuzziness?

EDIT Here's an example of a document (some fields taken out, content snipped):

{
  "id" : 1,
  "respondent" : "Union of India",
  "caseContent" : "<snip>..against the Union of India, through the ...<snip>"
}

As @Vlad suggested, i tried this:

POST /cases/_search
POST /cases/_search
{
  "suggest": {
    "respondent-suggest": {
      "prefix": "uni",
      "completion": {
        "field": "respondent.suggest",
        "skip_duplicates": true
      }
    },
    "caseContent-suggest": {
      "prefix": "uni",
      "completion": {
        "field": "caseContent.suggest",
        "skip_duplicates": true
      }
    }
  }
}

Which returns this:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "caseContent-suggest" : [
      {
        "text" : "uni",
        "offset" : 0,
        "length" : 3,
        "options" : [ ]
      }
    ],
    "respondent-suggest" : [
      {
        "text" : "uni",
        "offset" : 0,
        "length" : 3,
        "options" : [
          {
            "text" : "Union of India",
            "_index" : "cases",
            "_type" : "_doc",
            "_id" : "dI5hh3IBEqNFLVH6-aB9",
            "_score" : 1.0,
            "_ignored" : [
              "headNote.suggest"
            ],
            "_source" : {
              <snip>
            }
          }
        ]
      }
    ]
  }
}

So looks like it matches on the respondent field, which is great! But, it didn't match on the caseContent field, even though the text (see above) includes the phrase "against the Union of India".. shouldn't it match there? or is it because how the text is broken up?

like image 230
RPM1984 Avatar asked Jun 03 '20 03:06

RPM1984


People also ask

How does Elasticsearch implement autocomplete?

Autocomplete can be achieved by changing match queries to prefix queries. While match queries work on token (indexed) to token (search query tokens) match, prefix queries (as their name suggests) match all the tokens starting with search tokens, hence the number of documents (results) matched is high.

What is Elasticsearch suggester?

The term suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The term suggester doesn't take the query into account that is part of request.

Did you mean in Elasticsearch?

“Did you mean” is a very important feature in search engines because they help the user by displaying a suggested term so that he can make a more accurate search. To create a “did you mean” we are going to use the Phrase suggester because through it we will be able to suggest sentence corrections and not just terms.


1 Answers

Since you need autocomplete/suggest on each field, then you need to run a suggest query on each field and not on the copy_to field. That way you're guaranteed to have the proper prefixes.

copy_to fields are great for searching in multiple fields, but not so good for auto-suggest/-complete type of queries.

The idea is that for each of your fields, you should have a completion sub-field so that you can get auto-complete results for each of them.

PUT index
{
  "mappings": {
    "properties": {
      "text1": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      },
      "text2": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      },
      "text3": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}

Your suggest queries would then run on all the sub-fields directly:

POST index/_search?pretty
{
    "suggest": {
        "text1-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text1.suggest" 
            }
        },
        "text2-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text2.suggest" 
            }
        },
        "text3-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text3.suggest" 
            }
        }
    }
}

That takes care of the auto-complete/-suggest part. For misspellings, the suggest queries allow you to specify a fuzzy parameter as well

UPDATE

If you need to do prefix search on all sentences within a body of text, the approach needs to change a bit.

The new mapping below creates a new completion field next to the text one. The idea is to apply a small transformation (i.e. split sentences) to what you're going to store in the completion field. So first create the index mapping like this:

PUT index
{
  "mappings": {
    "properties": {
      "text1": {
        "type": "text",
      },
      "text1Suggest": {
        "type": "completion"
      }
    }
  }
}

Then create an ingest pipeline that will populate the text1Suggest field with sentences from the text1 field:

PUT _ingest/pipeline/sentence
{
  "processors": [
    {
      "split": {
        "field": "text1",
        "target_field": "text1Suggest.input",
        "separator": "\\.\\s+"
      }
    }
  ]
}

Then we can index a document such as this one (with only the text1 field as the completion field will be built dynamically)

PUT test/_doc/1?pipeline=sentence
{
  "text1": "The crazy fox. The quick snail. John goes to the beach"
}

What gets indexed looks like this (your text1 field + another completion field optimized for sentence prefix completion):

{
  "text1": "The crazy fox. The cat drinks milk. John goes to the beach",
  "text1Suggest": {
    "input": [
      "The crazy fox",
      "The cat drinks milk",
      "John goes to the beach"
    ]
  }
}

And finally you can search for prefixes of any sentence, below we search for John and you should get a suggestion:

POST test/_search?pretty
{
  "suggest": {
    "text1-suggest": {
      "prefix": "John",
      "completion": {
        "field": "text1Suggest"
      }
    }
  }
}
like image 73
Val Avatar answered Nov 15 '22 08:11

Val