Our Account
model has a first_name
, last_name
and a ssn
(social security number).
I want to do partial matches on the first_name,
last_name' but an exact match on ssn
. I have this so far:
settings analysis: {
filter: {
substring: {
type: "nGram",
min_gram: 3,
max_gram: 50
},
ssn_string: {
type: "nGram",
min_gram: 9,
max_gram: 9
},
},
analyzer: {
index_ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "substring"]
},
search_ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "substring"]
},
ssn_ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["ssn_string"]
},
}
}
mapping do
[:first_name, :last_name].each do |attribute|
indexes attribute, type: 'string',
index_analyzer: 'index_ngram_analyzer',
search_analyzer: 'search_ngram_analyzer'
end
indexes :ssn, type: 'string', index: 'not_analyzed'
end
My search is as follows:
query: {
multi_match: {
fields: ["first_name", "last_name", "ssn"],
query: query,
type: "cross_fields",
operator: "and"
}
}
So this works:
Account.search("erik").records.to_a
and even (for Erik Smith):
Account.search("erik smi").records.to_a
and the ssn:
Account.search("111112222").records.to_a
but not:
Account.search("erik 111112222").records.to_a
Any idea if I am indexing or querying wrong?
Thank you for any help!
Does it have to be done with a single query string? If not, I would do something like this:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"_all": {
"enabled": true,
"index_analyzer": "ngram_analyzer",
"search_analyzer": "standard"
},
"properties": {
"first_name": {
"type": "string",
"include_in_all": true
},
"last_name": {
"type": "string",
"include_in_all": true
},
"ssn": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
}
}
}
}
}
Notice the use of the_all field. I included first_name
and last_name
in _all
, but not ssn
, and ssn
is not analyzed at all since I want to do exact matches against it.
I indexed a couple of documents for illustration:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"first_name":"Erik","last_name":"Smith","ssn":"111112222"}
{"index":{"_id":2}}
{"first_name":"Bob","last_name":"Jones","ssn":"123456789"}
Then I can query for the partial names, and filter by the exact ssn:
POST /test_index/doc/_search
{
"query": {
"filtered": {
"query": {
"match": {
"_all": {
"query": "eri smi",
"operator": "and"
}
}
},
"filter": {
"term": {
"ssn": "111112222"
}
}
}
}
}
And I get back what I'm expecting:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.8838835,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.8838835,
"_source": {
"first_name": "Erik",
"last_name": "Smith",
"ssn": "111112222"
}
}
]
}
}
If you need to be able to do the search with a single query string (no filter), you could include ssn
in the all
field as well, but with this setup it will also match on partial strings (like 111112
) so that may not be what you want.
If you only want to match prefixes (i.e., search terms that start at the beginning of the words), you should use edge ngrams.
I wrote a blog post about using ngrams which might help you out a little: http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch
Here is the code I used for this answer. I tried a few different things, including the setup I posted here, and another inluding ssn
in _all
, but with edge ngrams. Hope this helps:
http://sense.qbox.io/gist/b6a31c929945ef96779c72c468303ea3bc87320f
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With