Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch: Keep redundant (denormalized) data or keep a list of ids for cross-referencing?

Is it better to store redundant data in an index or to have two indexes and then cross reference them? ie, A user wants to look up movies at a theater.

Since the user is interested in looking up movies by theater, we can have a theater index with movies nested_type:

// Here, movies field will be a Nested Type, not Object Type.

eclient.index({
    index: 'myindex',
    type: 'theater',
    id: 1,
    body: {
        name: "Grand Cinema",
        description: "Come watch movies!",
        movies: [
           {
               title: "Red November",
               description: "A submarine hunt",
               rated: "R",
               score: 10.0
           },
           {
               title: "Cinderbrella",
               description: "A burnt umbrella",
               rated: "PG",
               score: 8.8
           }
        ]

    }
});

This makes it easy to display a short list of movies by theater since the data is denormalized on theater.

We can also have a movie index which holds more detailed info about a movie:

eclient.index({
    index: 'myindex',
    type: 'movie',
    id: 1,
    body: {
        title: "Red November",
        description: "A submarine hunt",
        rated: "R",
        score: 10.0,
        actors: ["Bob", "Alice", "Carol"],
        // other details...
    }
});

And so when the user clicks on a movie to get more information, I can query the movie index and get its detailed info (ie, actors).

As you can see the theater holds lots of redundant data. It also makes it a pain to reindex if a movie gets an updated field. There will be two places to reindex: movies in theater index and in the movie index itself. ie, a movie gets an updated score.

I suppose I can change it so that theater index holds a bunch of movie ids, and have movies hold a bunch of theater ids:

// theater index
type: 'theater',
id: 1,
body: {
    name: ...
    description: ...
    movies: [ 1, 2 ]
}

// movie index
type: 'movie',
id: 1,
body: {
    title: ...
    description: ...
    theaters: [ 1, 2, 3]
}

But in this case how would I efficiently query movie information from theater? I would have to get the movie ids, then query them one-by-one on the movie index just to get some limited movie information to display on the theater page. However, if a field changes on a particular movie, then it's a lot less work to reindex.

Which is the better solution? I am more inclined to think the latter, because even if movies rarely change its data, to reindex every theater that holds that particular movie and to reindex the movie index itself may be a huge waste in computing power (bottleneck?).

A third solution is to have theater hold movies ids, and then just query the Postgres database on those ids. Although this is probably slower than querying them individually through the movie index?

like image 680
user21398 Avatar asked Jan 11 '15 05:01

user21398


1 Answers

What you are looking for is the parent child relation types.

Under same index , you can define one type as the parent of another. In this case , you can keep all your movie documents in movie type and theater documents under theater type. The advantage of this approach is that

  1. You can search a movie based on a condition of the theater document. For eg: , give me all movies released in theaters based in India. LINK - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html
  2. You can search theater based on movie, For eg: , give me all theatres where the movie "back to the future" is released. LINK - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-query.html

You can find more information on parent child here

Advantages of parent child relation over nested documents

  1. You can edit/update movie document without touching the theater information.
  2. You can add/delete/edit theater documents without touching movie information
like image 61
Vineeth Mohan Avatar answered Sep 28 '22 01:09

Vineeth Mohan