Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-language elastic search mapping setup

I have documents stored in MongoDB like so:

const demoArticle = {
  created: new Date(),
  title: [{
    language: 'english',
    value: 'This is the english title'
  }, {
    language: 'dutch',
    value: 'Dit is de nederlandse titel'
  }]
}

I want to add analyzers to specific languages, which is normally specified like so:

"mappings": {
   "article": {
      "properties": {
         "created": {
            "type": "date"
         },
         "title.value": {
           "type": "text",
           "analyzer": "english"
         }
      }
   }
}

The problem is however: depending on the language set on the child level, it should have an analyzer set according to that same language.

I've stumbled upon Dynamic Templates in ElasticSearch but I was not quite convinced this is suited for this use-case.

Any suggestions?

like image 422
randomKek Avatar asked Jul 25 '18 07:07

randomKek


People also ask

What is mapping type in Elasticsearch?

Elasticsearch supports two types of mappings: “Static Mapping” and “Dynamic Mapping.” We use Static Mapping to define the index and data types. However, we still need ongoing flexibility so that documents can store extra attributes.

Can Elasticsearch index have multiple mappings?

No, if you want to use a single index, you would need to define a single mapping that combines the fields of each document type. A better way might be to define separate indices on the same cluster for each document type.

Does Elasticsearch support multi language?

We'll support only a finite set of languages (German, English, Korean, Japanese and Chinese) since we need to set up a specific analyzer for each language. Any documents that aren't in one of our supported languages will get indexed in a default field with the standard analyzer.

How to handle multiple languages in one Elasticsearch document?

I would go with option 1 (separate index per language) as suggested by the Elasticsearch documentation since it makes sure you avoid term-frequency issues. If your document contains multiple languages, you can put in multiple indices and use field collapsing query-time to avoid duplicates of the same document being returned.

What is Elasticsearch mapping?

Elasticsearch - Mapping. Mapping is the outline of the documents stored in an index. It defines the data type like geo_point or string and format of the fields present in the documents and rules to control the mapping of dynamically added fields.

How do I add new fields to my Elasticsearch index?

Elasticsearch adds new fields automatically, just by indexing a document. You can add fields to the top-level mapping, and to inner object and nested fields. Use dynamic templates to define custom mappings that are applied to dynamically added fields based on the matching condition.

What types of data types are available in Elasticsearch?

These include array, JSON object and nested data type. An example of nested data type is shown below &minus Indices created in Elasticsearch 7.0.0 or later no longer accept a _default_ mapping. Indices created in 6.x will continue to function as before in Elasticsearch 6.x.


1 Answers

If you match MongoDB object language property to the exact name of the ES language analyzers all you would be needing than as per the recommended by Elastic way you would just add:

{
  "mappings": {
    "article": {
      "properties": {
        "created": {
          "type": "date"
        },
        "title": {
          "type": "text",
          "fields": {
            "english": {
              "type": "text",
              "analyzer": "english"
            },
            "dutch": {
              "type": "text",
              "analyzer": "dutch"
            },
            "bulgarian": {
              "type": "text",
              "analyzer": "bulgarian"
            }
          }
        }
      }
    }
  }

This way you have nice match on the language/analyzer field between MongoDB and ES.

like image 94
Akrion Avatar answered Sep 28 '22 04:09

Akrion