Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: storing model property on a field vs. on a different model

I am relatively new to Django and even database design and I have some thoughts I'd like to run by some other people. This isn't really a specific question; I just want to see how other people think about this stuff.

Let's say we have a model for an application to some service. It contains all the ordinary stuff you might imagine an application to contain:

class Application(models.Model):
  first_name = CharField(max_length=255)
  last_name = CharField(max_length=255)
  date_of_birth = DateField()
  married = BooleanField()
  # ...other stuff

Okay, that's all well and good. But now, imagine the webapp you are writing has the feature that you can complete your application partially, save it, and come back to it later. One way to do this is to add another attribute to the model above:

complete = BooleanField()

It works, it is pretty simple to use, but I don't really like it because it muddies the semantics of an application; it adds information that isn't intrinsically connected to the application. Another approach would be to create another model that keeps track of complete applications:

class CompleteApplication(models.Model):
  application = ForeignKey(Application)

I like this a bit better, since it keeps Application clean. However, it does have the disadvantage of messing up queries. Here are the two ways to query all complete applications in the system:

Method 1:

completed_applications = Application.objects.filter(complete=True)

Method 2:

pks = CompleteApplication.objects.all().values_list("application__pk")
complete_applications = Application.object.filter(pk__in=pks)

Method 2 is two lines of code vs. one and also two queries whereas previously one sufficed, so the database performance is going to take a hit.

There is a third way to do things: instead of creating a model that keeps track of complete applications, we could create a metadata model that stores any metadata that we might want to attach to the Application model. For our purposes, this model can contain a field that tracks completeness. However, this approach also has the benefit of allowing for an arbitrary number of metadata fields to be associated with each application without requiring a new DB table for each (as is the case with Method 2 above).

class ApplicationMeta(models.Model):
  application = ForeignKey(Application)
  complete = BooleanField()

And, for completeness (pun intended), to query all complete applications, we would use the following statement:

completed_applications = Application.objects.all(applicationmeta__complete=True)

Nice and simple, just like Method 1, but the query is certainly more work for the database. This method also has another drawback for certain applications. Pretend, for example, that we want to track some additional information about applications: they can be confirmed, or rejected. However, if an application is not confirmed, it does NOT necessarily mean it is rejected: it could be pending review. Additionally, let's say we want to track the date of confirmation and the date of rejection (if either is applicable, of course). Then, our metadata model becomes the following:

class ApplicationMeta(models.Model):
  complete = BooleanField()
  confirmed = BooleanField()
  rejected = BooleanField()
  date_confirmed = DateField()
  date_rejected = DateField()

Okay...this works, but it is starting to be a mess. Firstly, we have now opened up our system to potential error: what if somehow an ApplicationMeta instance has both rejected and confirmed set to True? We could do some fancy footwork with our class (maybe override setattr) to throw an exception if something funny happens, so we can prevent from persisting to the DB, but this is added complication that I hope is not necessary. Further, any model will either have at most one of date_confirmed or date_rejected set. Is that a problem? Here, I am not actually certain. My guess is this is likely a waste of space, but I don't actually know that. This example is simple, what if more complicated examples present us with tons of fields that will necessarily not be filled? Seems like bad design.

I'd love to hear some thoughts on these ideas.

Thanks!

like image 793
Joshua Gevirtz Avatar asked Nov 01 '22 09:11

Joshua Gevirtz


1 Answers

If you have a huge amount of possible metadata, the third approach might make sense for performance reasons. I wouldn't do it for a few boolean- and date columns. If you're concerned about the readability of the models themselves, you can factor out any metadata into an abstract base model. You can even reuse the abstract model for other models that require the same metadata. The information will still live in your Application model.

If you do take the second or third approach, I would use a OneToOneField rather than a ForeignKey. It ensures that there are no 2 possible ApplicationMeta models for a single Application, and has the added benefit of a UNIQUE database index.

As for the status of an application, the NullBooleanField is designed for exactly that. It start as None (NULL in the db) meaning "no value". It can then be set to True (accepted) or False (rejected).

like image 77
knbk Avatar answered Nov 13 '22 18:11

knbk