Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch Rails - Setting a Custom Analyzer

I'm using ElasticSearch in Rails 4 through elasticsearch-rails (https://github.com/elasticsearch/elasticsearch-rails)

I have a User model, with an email attribute.

I'm trying to use the 'uax_url_email' tokenizer described in the docs:

class User < ActiveRecord::Base
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks

  settings analysis: { analyzer: { whole_email: { tokenizer: 'uax_url_email' } } } do
    mappings dynamic: 'false' do
      indexes :email, analyzer: 'whole_email'
    end
  end

end

I followed examples in the wiki (https://github.com/elasticsearch/elasticsearch-rails/wiki) and the elasticsearch-model docs (https://github.com/elasticsearch/elasticsearch-rails/wiki) to arrive at this.

It doesn't work. If I query elasticsearch directly:

curl -XGET 'localhost:9200/users/_mapping

It returns:

{
  "users": {
    "mappings": {
      "user": {
        "properties": {
          "birthdate": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "created_at": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "email": {
            "type": "string"
          },
          "first_name": {
            "type": "string"
          },
          "gender": {
            "type": "string"
          },
          "id": {
            "type": "long"
          },
          "last_name": {
            "type": "string"
          },
          "name": {
            "type": "string"
          },
          "role": {
            "type": "string"
          },
          "updated_at": {
            "type": "date",
            "format": "dateOptionalTime"
          }
        }
      }
    }
  }
}
like image 417
Cam Price-Austin Avatar asked Aug 12 '14 07:08

Cam Price-Austin


People also ask

How do I create a custom analyzer?

For custom analyzers, use a type of custom or omit the type parameter. The previous example used tokenizer, token filters, and character filters with their default configurations, but it is possible to create configured versions of each and to use them in a custom analyzer.

What is difference between analyzer and tokenizer in Elasticsearch?

The key difference is that normalizers can only emit a single token while analyzers can emit many. Since they only emit one token, normalizers do not use a tokenizer. They do use character filters and token filters but are limited to using those that work at a single character at a time.

What is the default analyzer Elasticsearch?

By default, Elasticsearch uses the standard analyzer for all text analysis. The standard analyzer gives you out-of-the-box support for most natural languages and use cases. If you chose to use the standard analyzer as-is, no further configuration is needed.


1 Answers

This ended up being an issue with how I was creating the index. I was trying:

User.__elasticsearch__.client.indices.delete index: User.index_name
User.import

I expected this to delete the index, then re-import the values. However I needed to do:

User.__elasticsearch__.create_index! force: true
User.import
like image 122
Cam Price-Austin Avatar answered Oct 18 '22 09:10

Cam Price-Austin