Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to index and store multiple languages in ElasticSearch

I am trying to figure out gow to index the following in ES.

I have a lot of documents which are crawler from website with various language. Each document has a category such as Airport, restaurant, river, beach etc ., and it's language such as Arabic, English.. like

doc { language:"eng" , content :"something here" , category:"beach" }

doc { language:"vn" , content :"Xin chao" , category:"beach" }

I want to index and search documents with each languages;

I choose English options, and search with query " here " => RESUTLS

Should I :

  1. Setup each Elastic Core ( per machine per language) for per language. JUST COPY ES to run :)

    Eg : create elasticsearch_ENGLISH, elastichsearch_VIETNAMESE

  2. created each language with each index of ElasticSearch Eg: create indexs

/english/type/

/vietnames/type/ . When I search some query, I just search only index of language

OR do it some other way I am not aware of :) ?

like image 669
phuongdo Avatar asked Oct 18 '12 16:10

phuongdo


People also ask

Does Elasticsearch support multi language?

Optimize Multilingual Results with Elasticsearch We are glad to answer yes, it definitely can handle multilingual search, and the default configuration already provides great results for virtually all of the world's most common languages.

Is Elasticsearch good for storage?

Elasticsearch allows you to store, search, and analyze large amounts of structured and unstructured data. This speed, scale, and flexibility makes the Elastic Stack a powerful solution for a wide variety of use cases, like system observability, security (threat hunting and prevention), enterprise search, and more.

How many records can Elasticsearch handle?

Every Elasticsearch index made up of one or more shards which are internally Lucene index which has a hard limit of ~2 billion(precisely equal to INT_MAX value-128) on maximum no of the document as explained in this link and this link.

Does Elasticsearch need schema?

No, Elasticsearch does not require a schema. It is a schema-less database, which means that you do not need to define a schema before you index data.


1 Answers

Not sure I fully understood your concern.

If you need to search on the full cluster (I mean search in every language), you can't create one setup per language.

That said, you have many options:

  • Create one index per language and have a mapping for each index/type.
  • Use the _analyzer field to indicate the language for your document. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-analyzer-field.html
  • Use a multifield with a different analyzer for each language. See https://www.elastic.co/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html

It's not a full answer but some clues to help you...

like image 61
dadoonet Avatar answered Oct 15 '22 05:10

dadoonet