Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modeling Hierarchical Data - GAE

I'm new in google-app-engine and google datastore (bigtable) and I've some doubts in order of which could be the best approach to design the required data model.

I need to create a hierarchy model, something like a product catalog, each domain has some subdomains in deep. For the moment the structure for the products changes less than the read requirements. Wine example:

  • Origin (Toscana, Priorat, Alsacian)
  • Winery (Belongs only to one Origin)
  • Wine (Belongs only to one Winery)

All the relations are disjoint and incomplete. Additionally in order of the requirements probably we need to store counters of use for every wine (could require transactions)

In order of the documentation seems there're different potential solutions:

  • Ancestors management. Using parent relations and transactions
  • Pseudo-ancestor management. Simulating ancestors with a db.ListProperty(db.Key)
  • ReferenceProperty. Specifying explicitelly the relation between the classes

But in order of the expected requests to get wines... sometimes by variety, sometimes by origin, sometimes by winery... i'm worried about the behaviour of the queries using these structures (like the multiple joins in a relational model. If you ask for the products of a family... you need to join for the final deep qualifier in the tree of products and join since the family)

Maybe is better to create some duplicated information (in order of the google team recommendations: operations are expensive, but storage is not, so duplicate content should not be seen the main problem)

Some responses of other similar questions suggest:

  • Store all the parent ids as a hierarchy in a string... like a path property
  • Duplicate the relations between the Drink entity an all the parents in the tree ...

Any suggestions?


Hi Will,

Our case is more an strict hierarchical approach as you represent in the second example. And the queries is for retrieving list of products, retrieve only one is not usual.

We need to retrieve all the wines from an Origin, from a Winery or from a Variety (If we supose that the variety is another node of the strict hierarchical tree, is only an example)

One way could be include a path property, as you mentioned:

  • /origin/{id}/winery/{id}/variety/{id}

To allow me to retrieve a list of wines from a variety applying a query like this:

wines_query = Wine.all()
wines_query.filter('key_name >','/origin/toscana/winery/latoscana/variety/merlot/')
wines_query.filter('key_name <','/origin/toscana/winery/latoscana/variety/merlot/zzzzzzzz')

Or like this from an Origin:

wines_query = Wine.all()
wines_query.filter('key_name >','/origin/toscana/')
wines_query.filter('key_name <','/origin/toscana/zzzzzz')

Thank you!

like image 676
Iván Peralta Avatar asked Nov 14 '22 07:11

Iván Peralta


1 Answers

I'm not sure what kinds of queries you'll need to do in addition to those mentioned in the question, but storing the data in an explicit ancestor hierarchy would make the ones you asked about fall out pretty easily.

For example, to get all wines from a particular origin:

origin_key = db.Key.from_path('Origin', 123)
wines_query = db.Query(Wine).ancestor(origin_key)

or to get all wines from a particular winery:

origin_key = db.Key.from_path('Origin', 123)
winery_key = db.Key.from_path('Winery', 456, parent=origin_key)
wines_query = db.Query(Wine).ancestor(winery_key)

and, assuming you're storing the variety as a property on the Wine model, all wines of a particular variety is as simple as

wines_query = Wine.all().filter('variety =', 'merlot')

One possible downside of this strict hierarchical approach is the kind of URL scheme it can impose on you. With a hierarchy that looks like

Origin -> Winery -> Wine

you must know the key name or ID of a wine's origin and winery in order to build a key to retrieve that wine. Unless you've already got the string representation of a wine's key. This basically forces you to have URLs for wines in one of the following forms:

  • /origin/{id}/winery/{id}/wine/{id}
  • /wine/{opaque and unfriendly datastore key as a string}

(The first URL could of course be replaced with querystring parameters; the important part is that you need three different pieces of information to identify a given wine.)

Maybe there are other alternatives to these URL schemes that have not occurred to me, though.

like image 162
Will McCutchen Avatar answered Nov 17 '22 06:11

Will McCutchen