Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NEST Query for exact text matching

I am trying to write a NEST query that should return results based on exact string match. I have researched on web and there are suggestions about using Term, Match, MatchPhrase. I have tried all those but my searches are returning results that contains part of search string. For example, In my database i have following rows of email addresses:

[email protected]

[email protected]

[email protected]

Irrespective of whether i use:

client.Search<Emails>(s => s.From(0)
                        .Size(MaximumSearchResultsSize)
                        .Query(q => q.Term( p=> p.OnField(fielname).Value(fieldValue))))

or

  client.Search<Emails>(s => s.From(0).
                              Size(MaximumPaymentSearchResults).
                              Query(q=>q.Match(p=>p.OnField(fieldName).Query(fieldValue))));                                              

My search results are always returning rows containing "partial search" string.

So, if i provide the search string as "ter", I am still getting all the 3 rows. [email protected]

[email protected]

[email protected]

I expect to see no rows returned if the search string is "ter".If the search string is "[email protected]" then i would like to see only "[email protected]".

Not sure what am i doing wrong.

like image 549
CSC Avatar asked Aug 25 '15 17:08

CSC


People also ask

Should minimum should match?

Minimum Should Match is another search technique that allows you to conduct a more controlled search on related or co-occurring topics by specifying the number of search terms or phrases in the query that should occur within the records returned.

What are term based search queries?

Term queryedit. Returns documents that contain an exact term in a provided field. You can use the term query to find documents based on a precise value such as a price, a product ID, or a username. Avoid using the term query for text fields.


1 Answers

Based on the information you have provided in the question, it sounds like the field that contains the email address has been indexed with the Standard Analyzer, the default analyzer applied to string fields if no other analyzer has been specified or the field is not marked as not_analyzed.

The implications of the standard analyzer on a given string input can be seen by using the Analyze API of Elasticsearch:

curl -XPOST "http://localhost:9200/_analyze?analyzer=standard&text=ter%40gmail.com

The text input needs to be url encoded, as demonstrated here with the @ symbol. The results of running this query are

{
   "tokens": [
      {
         "token": "ter",
         "start_offset": 0,
         "end_offset": 3,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "gmail.com",
         "start_offset": 4,
         "end_offset": 13,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

We can see that the standard analyzer produces two tokens for the input, ter and gmail.com, and this is what will be stored in the inverted index for the field.

Now, running a Match query will cause the input to the match query to be analyzed, by default using the same analyzer as the one found in the mapping definition for the field on which the match query is being applied.

The resulting tokens from this match query analysis are then combined by default into a boolean or query such that any document that contains any one of the tokens in inverted index for the field will be a match. For the example

text [email protected], this would mean any documents that have a match for ter or gmail.com for the field would be a hit

// Indexing
input: [email protected] -> standard analyzer -> ter,gmail.com in inverted index

// Querying
input: [email protected] -> match query -> docs with ter or gmail.com are a hit!

Clearly, for an exact match, this is not what we intend at all!

Running a Term query will cause the input to the term query to not be analyzed i.e. it's a query for an exact match to the term input, but running this on a field that has been analyzed at index time could potentially be a problem; since the value for the field has undergone analysis but the input to the term query has not, you are going to get results returned that exactly match the term input as a result of the analysis that happened at index time. For example

// Indexing
input: [email protected] -> standard analyzer -> ter,gmail.com in inverted index

// Querying
input: [email protected] -> term query -> No exact matches for [email protected]

input: ter -> term query -> docs with ter in inverted index are a hit!

This is not what we want either!

What we probably want to do with this field is set it to be not_analyzed in the mapping definition

putMappingDescriptor
    .MapFromAttributes()
    .Properties(p => p
        .String(s => s.Name(n => n.FieldName).Index(FieldIndexOption.NotAnalyzed)
    );

With this in place, we can search for exact matches with a Term filter using a Filtered query

// change dynamic to your type
var docs = client.Search<dynamic>(b => b
    .Query(q => q
        .Filtered(fq => fq
            .Filter(f => f
                .Term("fieldName", "[email protected]")
            )
        )
    )
);

which will produce the following query DSL

{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "fieldName": "[email protected]"
        }
      }
    }
  }
}
like image 106
Russ Cam Avatar answered Sep 29 '22 11:09

Russ Cam