Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent Facet Terms from tokenizing

I am using Facet Terms to get all the unique values and their count for a field. And I am getting wrong results.

term: web 
Count: 1191979 
term: misc 
Count: 1191979 
term: passwd 
Count: 1191979 
term: etc 
Count: 1191979 

While the actual result should be:

term: WEB-MISC /etc/passwd 
Count: 1191979 

Here is my sample query:

{
  "facets": {
    "terms1": {
      "terms": {
        "field": "message"
      }
    }
  }
}
like image 953
jmnwong Avatar asked Apr 10 '12 17:04

jmnwong


1 Answers

If reindexing is an option, it would be the best to change mapping and mark this fields as not_analyzed

"your_field" : { "type": "string", "index" : "not_analyzed" }

You can use multi field type if keeping an analyzed version of the field is desired:

"your_field" : {
  "type" : "multi_field",
    "fields" : {
      "your_field" : {"type" : "string", "index" : "analyzed"},
      "untouched" : {"type" : "string", "index" : "not_analyzed"}
  }
}

This way, you can continue using your_field in the queries, while running facet searches using your_field.untouched.

Alternatively, if this field is stored, you can use a script field facet instead:

"facets" : {
  "term" : {
    "terms" : {
      "script_field" : "_fields.your_field.value"
    }
  }
}

As the last resort, if this field is not stored, but record source is stored in the index, you can try this:

"facets" : {
  "term" : {
    "terms" : {
      "script_field" : "_source.your_field"
    }
  }
}

The first solution is the most efficient. The last solution is the least efficient and may take a lot of time on a large index.

like image 95
imotov Avatar answered Oct 05 '22 18:10

imotov