Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

At what rate do indexes "explode" in GAE's big table?

At what rate do indexes "explode" in GAE's big table?

The excerpt from their documentation below explains that for collection values, indexes can "explode" exponentially.

Does this mean that for an object with two collection values, there is an index entry for each subset of values in the first collection paired with each subset in the second collection? Or is there only an index entry for each possible pair of values?

Example:

Entity:

widget:{
     mamas_list:         ['cookies', 'puppies']
     papas_list:         ['rain', 'sun']
    }

Index entry for each subset of values in the first collection paired with each subset in the second collection:

cookies         rain
cookies puppies rain
cookies puppies rain sun
cookies         sun
cookies         rain sun
puppies         rain
puppies         sun
puppies         rain sun

Only an index entry for each possible pair of values:

cookies         rain
cookies         sun
puppies         rain
puppies         sun

Exploding indexes excerpt:

Source: https://developers.google.com/appengine/docs/python/datastore/indexes#Index_Limits

an entity that can have multiple values for the same property requires a separate index entry for each value; again, if the number of possible values is large, such an entity can exceed the entry limit.

The situation becomes worse in the case of entities with multiple properties, each of which can take on multiple values. To accommodate such an entity, the index must include an entry for every possible combination of property values. Custom indexes that refer to multiple properties, each with multiple values, can "explode" combinatorially, requiring large numbers of entries for an entity with only a relatively small number of possible property values. (Taken from: )

like image 794
Chris Dutrow Avatar asked Nov 03 '22 22:11

Chris Dutrow


1 Answers

Chris,

You'll only have an 'exploding index problem' in cases you explicitly add an index.yaml entry for multiple repeated properties, and when objects saved to the table have too many multiple properties.

In the example, does your index.yaml add this index?

- kind: widget
  properties:
  - name: mamas_list
  - name: papas_list

If you save the sample object to the datastore:

widget(mamas_list=['a', 'b'], papas_list['c', 'd']).put()

There will be 4 different indexes to be saved:

['a', 'c'] ['a', 'd'] ['b', 'c'] ['b', 'd']

The whole purpose of adding this index would be to allow querying by these 2 properties:

widget.query().filter(mamas_list=='a').filter(papas_list=='d').fetch()

You could always avoid an exploding index (not found in this sample case), using the zig-zag algorithm indexes:

http://www.google.com/events/io/2010/sessions/next-gen-queries-appengine.html

like image 155
Felipe Hoffa Avatar answered Nov 11 '22 16:11

Felipe Hoffa