I'm developing an application for Google App Engine which uses BigTable for its datastore. It's an application about writing a story collaboratively. It's a very simple hobby project that I'm working on just for fun. It's open source and you can see it here: http://story.multifarce.com/ The idea is that anyone can write a paragraph, which then needs to be validated by two other people. A story can also be branched at any paragraph, so that another version of the story can continue in another direction. Imagine the following tree structure: <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Binary_tree.svg/220px-Binary_tree.svg.png" alt=""> Every number would be a paragraph. I want to be able to select all the paragraphs in every unique story line. Basically, those unique story lines are (2, 7, 2); (2, 7, 6, 5); (2, 7, 6, 11) and (2, 5, 9, 4). Ignore that the node "2" appears twice, I just took a tree structure diagram from Wikipedia. I also made a diagram of a proposed solution: https://docs.google.com/drawings/edit?id=1fdUISIjGVBvIKMSCjtE4xFNZxiE08AoqvJSLQbxN6pc&hl=en How can I set up a structure is performance efficient both for writing, but most importantly for reading?

There are a number of well known ways to represent trees in databases; each of them have their pros and cons. Here are the most common: <ul> <li> Adjacency list, where each node stores the ID of its parent.</li> <li> Materialized path, which is the strategy Keyur describes. This is also the approach used by entity groups (eg, parent entities) in App Engine. It's also more or less what you're describing in your update.</li> <li> Nested sets, where each node has 'left' and 'right' IDs, such that all child nodes are contained in that range.</li> <li>Adjacency lists agumented with a root ID.</li> </ul> Each of these has its own advantages and disadvantages. Adjacency lists are simple, and cheap to update, but require multiple queries to retrieve a subtree (one for each parent node). Augmented adjacency lists make it possible to retrieve an entire tree by storing the ID of the root node in every record. Materialized paths are easy to implement and cheap to update, and permit querying arbitrary subtrees, but impose increasing overhead for deep trees. Nested sets are tougher to implement, and require updating, on average, half the nodes each time you make an insertion. They allow you to query arbitrary subtrees, without the increasing key length issue materialized path has. In your specific case, though, it seems like you don't actually need a tree structure at all: each story, branched off an original though it may be, stands alone. What I would suggest is having a 'Story' model, which contains a list of keys of its paragraphs (Eg, in Python a db.ListProperty(db.Key)). To render a story, you fetch the Story, then do a batch fetch for all the Paragraphs. To branch a story, simply duplicate the story entry - leaving the references to Paragraphs unchanged.

Tree structures in a nosql database

Tags:

nosql

google-app-engine

google-cloud-datastore

bigtable

I'm developing an application for Google App Engine which uses BigTable for its datastore.

It's an application about writing a story collaboratively. It's a very simple hobby project that I'm working on just for fun. It's open source and you can see it here: http://story.multifarce.com/

The idea is that anyone can write a paragraph, which then needs to be validated by two other people. A story can also be branched at any paragraph, so that another version of the story can continue in another direction.

Imagine the following tree structure:

Every number would be a paragraph. I want to be able to select all the paragraphs in every unique story line. Basically, those unique story lines are (2, 7, 2); (2, 7, 6, 5); (2, 7, 6, 11) and (2, 5, 9, 4). Ignore that the node "2" appears twice, I just took a tree structure diagram from Wikipedia.

I also made a diagram of a proposed solution: https://docs.google.com/drawings/edit?id=1fdUISIjGVBvIKMSCjtE4xFNZxiE08AoqvJSLQbxN6pc&hl=en

How can I set up a structure is performance efficient both for writing, but most importantly for reading?

954

asked Jul 19 '10 13:07

Blixt

1 Answers

There are a number of well known ways to represent trees in databases; each of them have their pros and cons. Here are the most common:

Adjacency list, where each node stores the ID of its parent.
Materialized path, which is the strategy Keyur describes. This is also the approach used by entity groups (eg, parent entities) in App Engine. It's also more or less what you're describing in your update.
Nested sets, where each node has 'left' and 'right' IDs, such that all child nodes are contained in that range.
Adjacency lists agumented with a root ID.

Each of these has its own advantages and disadvantages. Adjacency lists are simple, and cheap to update, but require multiple queries to retrieve a subtree (one for each parent node). Augmented adjacency lists make it possible to retrieve an entire tree by storing the ID of the root node in every record.

Materialized paths are easy to implement and cheap to update, and permit querying arbitrary subtrees, but impose increasing overhead for deep trees.

Nested sets are tougher to implement, and require updating, on average, half the nodes each time you make an insertion. They allow you to query arbitrary subtrees, without the increasing key length issue materialized path has.

In your specific case, though, it seems like you don't actually need a tree structure at all: each story, branched off an original though it may be, stands alone. What I would suggest is having a 'Story' model, which contains a list of keys of its paragraphs (Eg, in Python a db.ListProperty(db.Key)). To render a story, you fetch the Story, then do a batch fetch for all the Paragraphs. To branch a story, simply duplicate the story entry - leaving the references to Paragraphs unchanged.

184

answered Sep 20 '22 00:09

Nick Johnson

Related questions
                            
                                Use php extensions in Google Cloud App Engine
                            
                                SSL with Google App Engine custom domain
                            
                                How to develop Google App Engine backend using Android Studio
                            
                                how to effectively run two inequality filters on queries in app engine
                            
                                Reliable way to execute thousands of independent transaction?
                            
                                How can I reduce Google App Engine datastore latency?
                            
                                Credit card payments and notifications on the Google App Engine
                            
                                Google App Engine or Django? [closed]
                            
                                Static files in (Java) App Engine not accessible
                            
                                Java PDF Library that works on GAE? [closed]
                            
                                How do you use numpy in google app engine (Python)
                            
                                Serve Static PNG files with an "image/png" content type, not "image/x-png"
                            
                                Is it possible to map multiple domains to single Google App Engine application?
                            
                                `gcloud app deploy` vs. `appcfg.py` [closed]
                            
                                How can I unit test responses from the webapp WSGI application in Google App Engine?
                            
                                App Engine deployment fails with "Client Error (400) The request is invalid for an unspecified reason."
                            
                                How to use Bigquery streaming insertall on app engine & python
                            
                                Google App Engine and CORS
                            
                                Is there any tool to backup/restore Google Datastore entities?
                            
                                Google App Engine: Logging in dev console?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With