Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic search fuzzy match with exact matches showing first

I am wanting to use fuzzy matching on a query but with exact matches showing at the top of the results.

I've tried the following.

$return = $this->_client->search(
            array(
                'index' => self::INDEX,
                'type'  => self::TYPE,
                'body'  => array(
                    'query' => array(
                        'bool' => array(
                            'must' => array(
                                'multi_match' => array(
                                    'query'     => $query,
                                    'fields'    => array('name', 'brand', 'description'),
                                    'boost'     => 10,
                                ),
                                'fuzzy_like_this' => array(
                                    'like_text' => $query,
                                    'fields'    => array('name', 'brand', 'description'),
                                    'fuzziness' => 1,
                                ),
                            ),
                        ),
                    ),
                    'size' => '5000',
                ),
            )
        );

This doesn't work due a malformed query error.

Any ideas?

like image 453
rsmarsha Avatar asked Jul 02 '14 10:07

rsmarsha


People also ask

Does Elasticsearch do fuzzy matching?

In Elasticsearch, fuzzy query means the terms are not the exact matches of the index. The result is 2, but you can use fuzziness to find the correct word for a typo in Elasticsearch's fuzzy in Match Query. For 6 characters, the Elasticsearch by default will allow 2 edit distance.

What is fuzzy query in Elasticsearch?

Fuzzy queryedit. Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance. An edit distance is the number of one-character changes needed to turn one term into another.

What is minimum should match Elasticsearch?

Minimum Should Match is another search technique that allows you to conduct a more controlled search on related or co-occurring topics by specifying the number of search terms or phrases in the query that should occur within the records returned.

How does match work in Elasticsearch?

The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. (Optional, string) Analyzer used to convert the text in the query value into tokens. Defaults to the index-time analyzer mapped for the <field> .


1 Answers

I ended up not using fuzzy matching to solve my problem and went with using ngram's.

/**
 * Map - Create a new index with property mapping
 */
public function map()
{
    $params['index'] = self::INDEX;

    $params['body']['settings'] = array(
        'index' => array(
            'analysis' => array(
                'analyzer' => array(
                    'product_analyzer' => array(
                        'type'      => 'custom',
                        'tokenizer' => 'whitespace',
                        'filter'    => array('lowercase', 'product_ngram'),
                    ),
                ),
                'filter' =>  array(
                    'product_ngram' => array(
                        'type' => 'nGram',
                        'min_gram' => 3,
                        'max_gram' => 5,
                    ),
                )
            ),

        )
    );

    //all the beans
    $mapping = array(
        '_source'    => array(
            'enabled' => true
        ),
        'properties' => array(
            'id'          => array(
                'type' => 'string',
            ),
            'name'        => array(
                'type'     => 'string',
                'analyzer' => 'product_analyzer',
                'boost'    => '10',
            ),
            'brand'       => array(
                'type' => 'string',
                'analyzer' => 'product_analyzer',
                'boost'    => '5',
            ),
            'description' => array(
                'type' => 'string',
            ),
            'barcodes'    => array(
                'type' => 'string'
            ),
        ),
    );

    $params['body']['mappings'][self::TYPE] = $mapping;

    $this->_client->indices()->create($params);
}


public function search($query)
{
    $return = $this->_client->search(
        array(
            'index' => self::INDEX,
            'type'  => self::TYPE,
            'body'  => array(
                'query' => array(
                    'multi_match' => array(
                        'query'  => $query,
                        'fields' => array('id', 'name', 'brand', 'description', 'barcodes'),
                    ),
                ),
                'size' => '5000',
            ),
        )
    );

    $productIds = array();

    if (!empty($return['hits']['hits'])) {
        foreach ($return['hits']['hits'] as $hit) {
            $productIds[] = $hit['_id'];
        }
    }

    return $productIds;
}

The result is exactly what I was looking for. It constructs matches based on how many ngram part the search query has within it.

like image 100
rsmarsha Avatar answered Oct 11 '22 17:10

rsmarsha