Best practice for handling many-to-many relationships in Elasticsearch?

Tags:

elasticsearch

I'm pretty sure I know the answer to this question but am looking for confirmation from someone with more Elasticsearch experience than me.

Let's say I've got a database containing Authors and Books. An author can be associated with 0 or more books, and a book can be associated with 1 or more authors. We want users to be able to search on author name to find the author and all his/her books, and we also want them to be able to search on book title to get back its author(s). We know there will be plenty of multi-author books.

Because Elasticsearch only directly supports one level of parent-child relationships, and because children can only have one parent, it seems to me that we need to denormalize the data and use nested objects to establish this relationship. If we modify properties of an author who has published 23 books, we will need to reindex the author record and all 23 of his/her book records.

In my fantasy world, I'd love to have those 23 books each contain an array of author IDs so that I don't have to reindex books when I reindex authors. It seems like this would definitely be possible using Elasticsearch's parent-child support if a book could only have one author, but because of the many-to-many requirement, I have to use nested objects and reindex any related objects whenever anything changes.

Is this correct? It certainly seems like more work (and certainly more updates), but I want to do this the right way, not the "clever" way that introduces complexity and bugs and madness.

Any guidance would be appreciated.

773

asked Oct 29 '14 20:10

Joel P.

1 Answers

From your question I can safely assume that ES will not be your primary data-store. So the main question as to how to denormalise your many-to-many relationship is to figure out "how & what" will you use ES. That is what queries are you expected to build.

Thinking of "query command" design and denormalize accordingly. Here are a few pointers:

denormalising Authors IDs into the book: would you expect a user to execute a search such as "all book for userId=XYZ". If not, you would rather need the name of the author as a multi-field in your Book document
duplicate, duplicate and duplicate. Figure out which data will be heavily updated (authors, as book general do not gain author after their publication). Denormalize author into books (names most likely). Duplicate (into another document type) something like "author_books" which will would be a child of authors and support update fairly often (again, denormalise the title and other relevant stuff to search from the author perspective).

Hope this makes some sense ;)

175

answered Sep 29 '22 17:09

gamars

Related questions
                            
                                null_value mapping in Elasticsearch
                            
                                ElasticSearch returning only documents with distinct value
                            
                                ElasticSearch: How to configure logging.yml
                            
                                ElasticSearch multi_match query over multiple fields with Fuzziness
                            
                                Getting score null in Elastic search
                            
                                Migrate field type from text to keyword on Elasticsearch
                            
                                Using Cloudfront to expose ElasticSearch REST API in read only (GET/HEAD)
                            
                                Elasticsearch indexing not working and error message: node null not part of the cluster Cluster [elasticsearch], ignoring
                            
                                elasticsearch update gives unknown field error
                            
                                multiple inputs on logstash jdbc
                            
                                How to delete document types in elasticsearch?
                            
                                Limit ElasticSearch aggregation to top n query results
                            
                                Elasticsearch : Strip HTML tags before indexing docs with html_strip filter not working
                            
                                What is the difference between searchkick and elasticsearch-rails?
                            
                                Slow index speed of Elasticsearch
                            
                                Mysql: 7 billions records in a table
                            
                                Elasticsearch - using the path hierarchy tokenizer to access different level of categories
                            
                                Elasticsearch Spring boot integration test
                            
                                elasticsearch: extract number from a field

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With