In my elastic search server I have one index http://localhost:9200/blog
.
The (blog) index contains multiple types.
e.g.: http://localhost:9200/blog/posts
, http://localhost:9200/blog/tags
.
In the tags type I have created more than 1000 tags and 10 posts created in posts type.
e.g.: posts
{ "_index":"blog", "_type":"posts", "_id":"1", "_version":3, "found":true, "_source" : { "catalogId" : "1", "name" : "cricket", "url" : "http://www.wikipedia/cricket" } }
e.g.: tags
{ "_index":"blog", "_type":"tags", "_id":"1", "_version":3, "found":true, "_source" : { "tagId" : "1", "name" : "game" } }
I want to assign the existing tag to blog posts (i.e. relationship => mapping).
How do I assign the tags to posts mapping?
Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. Each document is a collection of fields, which each have their own data type. When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document.
No, if you want to use a single index, you would need to define a single mapping that combines the fields of each document type. A better way might be to define separate indices on the same cluster for each document type.
When you create a nested document, Elasticsearch actually indexes two separate documents (root object and nested object), then relates the two internally. Both docs are stored in the same Lucene block on the same Shard, so read performance is still very fast. This arrangement does come with some disadvantages.
Joining queriesedit Instead, Elasticsearch offers two forms of join which are designed to scale horizontally. Documents may contain fields of type nested . These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.
There are 4 approaches that you can use within Elasticsearch for managing relationships. They are very well outlined in the Elasticsearch blog post - Managing Relations Inside Elasticsearch I would recommend reading the entire article to get more details on each approach and then select that approach that best meets your business needs while remaining technically appropriate.
Here are the highlights for the 4 approaches.
Inner Object
- Easy, fast, performant
- Only applicable when one-to-one relationships are maintained
- No need for special queries
Nested
- Nested docs are stored in the same Lucene block as each other, which helps read/query performance. Reading a nested doc is faster than the equivalent parent/child.
- Updating a single field in a nested document (parent or nested children) forces ES to reindex the entire nested document. This can be very expensive for large nested docs
- “Cross referencing” nested documents is impossible
- Best suited for data that does not change frequently
Parent/Child
- Children are stored separately from the parent, but are routed to the same shard. So parent/children are slightly less performance on read/query than nested
- Parent/child mappings have a bit extra memory overhead, since ES maintains a “join” list in memory
- Updating a child doc does not affect the parent or any other children, which can potentially save a lot of indexing on large docs
- Sorting/scoring can be difficult with Parent/Child since the Has Child/Has Parent operations can be opaque at times
Denormalization
- You get to manage all the relations yourself!
- Most flexible, most administrative overhead
- May be more or less performant depending on your setup
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With