I have Django webapp with different content (e.g. Posts, Threads, comments) and I want to keep track of their history. So I'm trying to create a "version-control" for "content". This is how I'm doing it (skip to the end for the question):
I have a model Create_Content
which represents an action: has an actor (editor
), a set of modifications (edition
), a timestamp, and the content it acts on:
class Create_Content(models.Model):
editor = models.ForeignKey('User')
edition = models.TextField() # pickled dictionary
date = models.DateTimeField(auto_now_add=True)
content = models.ForeignKey('Content')
which is saved exactly once because history cannot be changed.
I define content has:
class Content(models.Model):
last_edit = models.ForeignKey(Create_Content, related_name='content_last_id')
first_edit = models.ForeignKey(Create_Content, related_name='content_first_id')
def author(self):
return self.first_edit.editor
where last_edit
is the last action performed it, and first_edition
I use to get the initial author of the content.
All the actual content of the webapp is implemented by derived models of Content, which are created/edited by derived models of Create_Content
. For instance,
class Comment(Content):
post = models.ForeignKey(Post, related_name="comment_set")
body = models.TextField()
is created during the save of Create_Comment(Create_Content)
(in fact all subclasses of Create_Content
are proxies).
For instance, user editions are instances of a subclass of Create_Content
(e.g. Edit_Comment(Create_Content)
) which, during save()
, makes last_edit
to point onto self
and updates the actual content (e.g. Edit_Comment.body
).
In the end I'm doing a simplified version of Git for a database:
Create_Content
are commits (which store the complete state in self.edition
).Content
is a specific branch+working directory (last_edit
being the pointer),Content
's to the new content (e.g. body
of the Comment
).What I realized is that every user action will have an entry in the table of Create_Content
.
My question are:
how bad is this to happen? I mean, this table can be pretty big, and all actions will be hitting it.
I think my approach is "clean", but I'm pretty sure I'm re-inventing the wheel. What is this specific wheel?
Django-versioning allows you to version the data stored in django models, and stores only diff, not content copy. Supports all field types excepts ManyToMany (currently). Maybe also useful for your project: the Django built-in comments app and the way of referring to objects.
Simply type python -m django --version or type pip freeze to see all the versions of installed modules including Django.
You can try: https://bitbucket.org/emacsway/django-versioning. Django-versioning allows you to version the data stored in django models, and stores only diff, not content copy. Supports all field types excepts ManyToMany (currently).
Maybe also useful for your project: the Django built-in comments app and the way of referring to objects. I think this is a nice and clean approach.
The physical database size doesn't matter. The number of records doesn't matter. See: How big can a MySQL database get before performance starts to degrade
But how useful is it to keep a full history? You could start to delete history like TimeMachine does. Have daily (for a week), weekly (for some months) and monthly records. Crontab or Django-Celery help you to delete old history.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With