Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort on analyzed/tokenized field in Elasticsearch?

We're storing a title field in our index and want to use the field for two purposes:

  1. We're analyzing with an ngram filter so we can provide autocomplete and instant results
  2. We want to be able to list results using an ASC sort on the title field rather than score.

The index/filter/analyzer is defined like so:

array(
    'number_of_shards' => $this->shards,
    'number_of_replicas' => $this->replicas,
    'analysis' => array(
        'filter' => array(
            'nGram_filter' => array(
                'type' => 'nGram',
                'min_gram' => 2,
                'max_gram' => 20,
                'token_chars' => array('letter','digit','punctuation','symbol')
            )
        ),

        'analyzer' => array(
            'index_analyzer' => array(
                'type' => 'custom',
                'tokenizer' =>'whitespace',
                'char_filter' => 'html_strip',
                'filter' => array('lowercase','asciifolding','nGram_filter')
            ),
            'search_analyzer' => array(
                'type' => 'custom',
                'tokenizer' =>'whitespace',
                'char_filter' => 'html_strip',
                'filter' => array('lowercase','asciifolding')
            )
        )
    )
),

The problem we're experiencing is unpredictable results when we Sort on the title field. After doing a little searching, we found this at the end of the sort man page at ElasticSearch... (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_memory_considerations)

For string based types, the field sorted on should not be analyzed / tokenized.

How can we both analyze the field and sort on it later? Do we need to store the field twice with one using not_analyzed in order to sort? Since the field _source is also storing the title value in it's original state, can that not be used to sort on?

like image 408
oucil Avatar asked Apr 24 '14 15:04

oucil


1 Answers

You can use the built in concept of Multi Field Type in Elasticsearch.

The multi_field type allows to map several core_types of the same value. This can come very handy, for example, when wanting to map a string type, once when it’s analyzed and once when it’s not_analyzed.

In the Elasticsearch Reference, please look at the String Sorting and Multi Fields guide on how to setup what you need.

Please note that Multi Field mapping configuration has changed between Elasticsearch 0.90.X and 1.X. Use the appropriate following guide based on your version:

  • 0.90 Multi Field Type
  • 1.X Multi Field Type
like image 61
Paige Cook Avatar answered Nov 15 '22 10:11

Paige Cook