Relational vs Non-Relational Data Modeling - what's the difference

Tags:

I'm new to databases and I've never worked with any RDBMS. However I get the basic idea of relational databases. At least I think I do ;-)

Let's say I have a user database with the following properties for each user:

user

id

name

zip

city

In a relational database I would for example model it in a table called user

user

id

name

location_id

and have a second table called location

location

id

zip

city

And location_id is a foreign key (reference) to an entry in the location table. If I understand it right the advantage is here, if the zip code for a certain city changes I only have to change exactly one entry.

So, let's go to the non-relational database, where I started to play around with Google App Engine. Here I would really model it like it was written down first in the specifications. I have a kind user:

class User(db.Model):
    name = db.StringProperty()
    zip = db.StringProperty()
    city = db.StringProperty()

The advantage is that I don't need to join two "tables", but the disadvantage is, that if the zip code changes I have to run a script that goes through all user entries and updates the zip code, correct?

So, now there is another option in Google App Engine, which is to use ReferenceProperties. I could have two kinds: user and location

class Location(db.Model):
    zip = db.StringProperty()
    city = db.StringProperty()

class User(db.Model):
    name = db.StringProperty()
    location = db.ReferenceProperty(Location)

If I'm not wrong I now have exactly the same model as in the relational database described above. What I'm wondering now is, first of all, is that wrong what I just did and does that destroy all the advantages of a non-relational database. I understand, that in order to get the value of zip and city I have to run I second query. But in the other case, to make a change in the zip code I have to run through all existing users.

So what are the implications of these two modeling possibilities in a non-relational database like Google's datastore. And what are typical use cases for both of them, meaning when should I use one and when the other.

Also as an additional question, if in a non-relation database I can model exactly the same what I can model in a relational database, why should I use a relational database at all?

Sorry if some of these questions sound naive, but I'm sure they will help a couple people, who are new to database systems to get a better understanding.

352

asked May 13 '11 16:05

znq

2 Answers

In my experience, the biggest difference is that non-relational datastores force you to model based on how you'll query, because of the lack of joins, and how you'll write, because of the transaction restrictions. This of course results in very denormalized models. After a while, I started to define all the queries first, to avoid having to rethink the models later.

Because of the flexibility of relational db's, you can think about each data family in separate, create relations between them and in the end query how you wish (abusing joins in so many cases).

172

answered Oct 24 '22 09:10

moraes

Imagine that GAE has two modes for the Datastore: RDMS-mode and non-RDMS-mode. If I take your ReferenceProperty example with the aim of "list all the users and all their zip codes" and write some code to print all of these.

For the [fictional] RDMS-mode Datastore it might look like:

for user in User.all().join("location"):
    print("name: %s zip: %s" % (user.name, user.location.zip))

Our RDMS system has handled the de-normalisation of the data behind the senes and done a nice job of returning all the data we needed in one query. This query did have a little bit of overhead as it had to stitch together our two tables.

For the non-RDMS Datastore our code might look like:

for user in User.all():
    location = Location.get( user.location )†
    print("name: %s zip: %s" % (user.name, location.zip))

Here the Datastore cannot help us join our data, and we must make an extra query for each and every user entity to fetch the location before we can print it.

This is in essence why you want to avoid overly normalised data on non-RDMS systems.

Now, everybody logically normalizes their data to some extent wether they are using RDMS or not, the trick is to find the trade off between convenience and performance for your use case.

† this is not valid appengine code, I'm just illustrating that user.location would trigger a db query. Also no-one should write code like my extreme example above, you can work around the continued fetching of related entities by say fetching locations in batches upfront.

if in a non-relation database I can model exactly the same what I can model in a relational database, why should I use a relational database at all?

relational-DB's excel at storing thousands-and-millions of rows of complex inter-related models of data, and allowing you to perform incredibly intricate queries to reform and access that data.

non-RDB's excel at storing billions+ rows of simple data and allowing you to fetch that data with simpler queries.

The choice should lie with your use-case really. The simpler structure of the non-relational model and design restraints that come with it is one of the main ways that AppEngine is able to promise to scale your app with demand.

answered Oct 24 '22 10:10

Chris Farmiloe

Related questions
                            
                                C++ Default constructor
                            
                                What is application's site of origin and when to use it
                            
                                Synthesized property and variable with underscore prefix: what does this mean? [duplicate]
                            
                                Different VirtualHosts with the same port
                            
                                What's the best way to detect a JSON request on ASP.NET
                            
                                Detect re (regexp) object in Python
                            
                                How can I get a TaskScheduler for a Dispatcher?
                            
                                Django custom field validator vs. clean
                            
                                What are some advantages of using Core Data? (as opposed to plist)
                            
                                Facebook like and Twitter tweet buttons causing Cross Domain Issues
                            
                                How can I set a cookie to expire after x days with this code I have? [duplicate]
                            
                                Determine if Matlab has a display available

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With