I've recently started using ElasticSearch. I try to complete some use cases. I have a problem for one of them.
I have indexed some users with their full name (e.g. "Jean-Paul Gautier", "Jean De La Fontaine").
I try to get all the full names responding to some query.
For example, I want the 100 most frequent full names beggining by "J"
{
"query": {
"query_string" : { "query": "full_name:J*" } }
},
"facets":{
"name":{
"terms":{
"field": "full_name",
"size":100
}
}
}
}
The result I get is all the words of the full names : "Jean", "Paul", "Gautier", "De", "La", "Fontaine".
How to get "Jean-Paul Gautier" and "Jean De La Fontaine" (all the full_name values begging by 'J') ? The "post_filter" option is not doing this, it only restrict this above subset.
Thanks
You just need to set "index": "not_analyzed"
on the field, and you will be able to get back the full, unmodified field values in your facet.
Typically, it's nice to have one version of the field that isn't analyzed (for faceting) and another that is (for searching). The "multi_field"
field type is useful for this.
So in this case, I can define a mapping as follows:
curl -XPUT "http://localhost:9200/test_index/" -d'
{
"mappings": {
"people": {
"properties": {
"full_name": {
"type": "multi_field",
"fields": {
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"full_name": {
"type": "string"
}
}
}
}
}
}
}'
Here we have two sub-fields. The one with the same name as the parent will be the default, so if you search against the "full_name"
field, Elasticsearch will actually use "full_name.full_name"
. "full_name.untouched"
will give you the facet results you want.
So next I add two documents:
curl -XPUT "http://localhost:9200/test_index/people/1" -d'
{
"full_name": "Jean-Paul Gautier"
}'
curl -XPUT "http://localhost:9200/test_index/people/2" -d'
{
"full_name": "Jean De La Fontaine"
}'
And then I can facet on each field to see what is returned:
curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
"size": 0,
"facets": {
"name_terms": {
"terms": {
"field": "full_name"
}
},
"name_untouched": {
"terms": {
"field": "full_name.untouched",
"size": 100
}
}
}
}'
and I get back the following:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"facets": {
"name_terms": {
"_type": "terms",
"missing": 0,
"total": 7,
"other": 0,
"terms": [
{
"term": "jean",
"count": 2
},
{
"term": "paul",
"count": 1
},
{
"term": "la",
"count": 1
},
{
"term": "gautier",
"count": 1
},
{
"term": "fontaine",
"count": 1
},
{
"term": "de",
"count": 1
}
]
},
"name_untouched": {
"_type": "terms",
"missing": 0,
"total": 2,
"other": 0,
"terms": [
{
"term": "Jean-Paul Gautier",
"count": 1
},
{
"term": "Jean De La Fontaine",
"count": 1
}
]
}
}
}
As you can see, the analyzed field returns single-word, lower-cased tokens (when you don't specify an analyzer, the standard analyzer is used), and the un-analyzed sub-field returns the unmodified original text.
Here is a runnable example you can play with: http://sense.qbox.io/gist/7abc063e2611846011dd874648fd1b77450b19a5
Try altering the mapping for "full_name":
"properties": {
"full_name": {
"type": "string",
"index": "not_analyzed"
}
...
}
not_analyzed
means that it will be kept as is, capitals, spaces, dashes etc, so that "Jean De La Fontaine" will stay findable and not be tokenized into "Jean" "De" "La" "Fontaine"
You can experiment with different analyzers using the api
Notice what the standard one does to a mulit part name:
GET /_analyze?analyzer=standard
{'Jean Claude Van Dame'}
{
"tokens": [
{
"token": "jean",
"start_offset": 2,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "claude",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "van",
"start_offset": 14,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "dame",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
}
]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With