Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch & Tire: Using Mapping and to_indexed_json

While reading the Tire doc, I was under the impression that you should use either mapping or to_indexed_json methods, since (my understanding was..) the mapping is used to feed the to_indexed_json.

The problem is, that I found some tutorials where both are used. WHY?

Basically, my app works right now with the to_indexed_json but I can't figure out how to set the boost value of some of the attributes (hence the reason I started looking at mapping) and I was wondering if using both would create some conflicts.

like image 499
Alain Avatar asked Jul 26 '12 14:07

Alain


People also ask

What is Elasticsearch is used for?

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.

Why use Elasticsearch instead of SQL?

1. Elasticsearch : Elasticsearch is a distributed search and analytics engine.It is open source and can be used for all types of data.It is implemented in Java programming language and supports all operating systems having java virtual machines (J.V.M).

Is Elasticsearch an ETL tool?

No, Elasticsearch is not an ETL tool. It is a free and open-source search engine for text, numeric, geospatial, structured, and unstructured data. Elasticsearch is mostly used in business intelligence, security intelligence, and operational intelligence. There are separate ETL tools available for Elasticsearch.

Is Elasticsearch SQL or NoSQL?

Completely open source and built with Java, Elasticsearch is a NoSQL database. That means it stores data in an unstructured way and that you cannot use SQL to query it.


1 Answers

While the mapping and to_indexed_json methods are related, they serve two different purposes, in fact.

The purpose of the mapping method is to define mapping for the document properties within an index. You may want to define certain property as "not_analyzed", so it is not broken into tokens, or set a specific analyzer for the property, or (as you mention) indexing time boost factor. You may also define multifield property, custom formats for date types, etc.

This mapping is then used eg. when Tire automatically creates an index for your model.

The purpose of the to_indexed_json method is to define a JSON serialization for your documents/models.

The default to_indexed_json method does use your mapping definition, to use only properties defined in the mapping — on a basis that if you care enough to define the mapping, by default Tire indexes only properties with defined mapping.

Now, when you want a tight grip on how your model is in fact serialized into JSON for elasticsearch, you just define your own to_indexed_json methods (as the README instructs).

This custom MyModel#to_indexed_method usually does not care about mapping definition, and builds the JSON serialization from scratch (by leveraging ActiveRecord's to_json, using a JSON builder such as jbuilder, or just building a plain old Hash and calling Hash#to_json).

So, to answer the last part of your question, using both mapping and to_indexed_json will absolutely not create any conflicts, and is in fact required to use advanced features in elasticsearch.

To sum up:

  1. You use the mapping method to define the mapping for your models for the search engine
  2. You use a custom to_indexed_json method to define how the search engine sees your documents/models.
like image 67
karmi Avatar answered Sep 19 '22 14:09

karmi