Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the performance drawbacks of flat documents vs. nested ones?

I have data which naturally fit into documents like

{
  "name": "Multi G. Enre",
  "books": [
    {
      "name": "Guns and lasers",
      "genre": "scifi",
      "publisher": "orbit"
    },
    {
      "name": "Dead in the night",
      "genre": "thriller",
      "publisher": "penguin"
    }
  ]
}

(the example is taken from a good review of nested and has_child documents)

In order to analyze them in Kibana and other software (a mix of legacy and lazyness), they are flattened:

{
  "name": "Multi G. Enre",
  "book_name": "Guns and lasers",
  "book_genre": "scifi",
  "book_publisher": "orbit"
}
{
  "name": "Multi G. Enre",
  "book_name": "Dead in the night",
  "book_genre": "thriller",
  "book_publisher": "penguin"
}

Beside the obvious growth of the size of the index, is there generally a performance impact of querying such flat records (the queries are of the type "writer with scifi books from penguin") versus nested ones, versus parent/child ones?

like image 216
WoJ Avatar asked Feb 18 '16 13:02

WoJ


1 Answers

Querying the flat index will be much, MUCH better! The whole idea behind noSQL databases is to denormalize your data.

In your first example notice that you would need to update that record each time you add a book. That is a big no-no in ES/noSQL. ES records should be immutable. Behind the scenes updates are really delete+insert which is very expensive.

like image 57
jhilden Avatar answered Oct 19 '22 01:10

jhilden