Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with punctuation in an ElasticSearch field

I have a field in a document stored in Elastic Search, which I want to be analyzed as a full text field. In one case, it contains a value for the name field like this:

A&B Corp

I want to be able to search the documents for an auto-complete widget, using a query like this (suppose the user typed A&B into the autocomplete field). The intention is to match documents that contain the any terms with the typed prefix.

{   "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "A&B*",
          "fields": [
            "firstName",
            "lastName",
            "name",
            "key",
            "email"
          ]
        }
      },
      "filter": {
        "terms": {
          "environmentId": [
            "foo"
          ]
        }
      }
    }
  }
}

```

My mapping for the name field looks like this:

"name": {
    "type": "string"
},

But, I get no results. The query structure works for documents that don't have & in the field, so I'm pretty sure that is part of the problem.

But, I'm not sure how to deal with this. I am pretty sure I still want to analyze the field for full text search.

In addition, if I add a space before the * in the query (ie, "query": "A&B *",) then I get results including A&B, so I don't think it is just discarding the ampersand and treating the A and B as separate terms.

Should I change my mapping? The query?

like image 529
pkaeding Avatar asked Oct 31 '22 18:10

pkaeding


1 Answers

The Query_string query has a set of reserved characters that needs to be escaped.

query_string : Read the reserved characters section

So to search for

'A&B' (or) 'A&B Corp' (or) 'A&B....'

Your query must be "A&B\\*" such that the query_string parser treats it as a * wildcard operator.

  1. While currently your query is searching for exact match of "A&B*" it expects asterik to be part of your data.

  2. And when you search "A&B *" the whitespace is a reserved character so its now searching for "A&B" (or) "*" and hence you get a match in this case.

like image 62
Divya Sriram Avatar answered Nov 15 '22 07:11

Divya Sriram