Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a tool to check database integrity in Django?

The MySQL database powering our Django site has developed some integrity problems; e.g. foreign keys that refer to nonexistent rows. I won't go into how we got into this mess, but I'm now looking at how to fix it.

Basically, I'm looking for a script that scans all models in the Django site, and checks whether all foreign keys and other constraints are correct. Hopefully, the number of problems will be small enough so they can be fixed by hand.

I could code this up myself but I'm hoping that somebody here has a better idea.

I found django-check-constraints but it doesn't quite fit the bill: right now, I don't need something to prevent these problems, but to find them so they can be fixed manually before taking other steps.

Other constraints:

  • Django 1.1.1 and upgrading has been determined to break things
  • MySQL 5.0.51 (Debian Lenny), currently with MyISAM tables
  • Python 2.5, might be upgradable but I'd rather not right now

(Later, we will convert to InnoDB for proper transaction support, and maybe foreign key constraints on the database level, to prevent similar problems in the future. But that's not the topic of this question.)

like image 208
Thomas Avatar asked Jan 19 '11 11:01

Thomas


2 Answers

I whipped up something myself. The management script below should be saved in myapp/management/commands/checkdb.py. Make sure that intermediate directories have an __init__.py file.

Usage: ./manage.py checkdb for a full check; use --exclude app.Model or -e app.Model to exclude the model Model in the app app.

from django.core.management.base import BaseCommand, CommandError
from django.core.management.base import NoArgsCommand
from django.core.exceptions import ObjectDoesNotExist
from django.db import models
from optparse import make_option
from lib.progress import with_progress_meter

def model_name(model):
    return '%s.%s' % (model._meta.app_label, model._meta.object_name)

class Command(BaseCommand):
    args = '[-e|--exclude app_name.ModelName]'
    help = 'Checks constraints in the database and reports violations on stdout'

    option_list = NoArgsCommand.option_list + (
        make_option('-e', '--exclude', action='append', type='string', dest='exclude'),
    )

    def handle(self, *args, **options):
        # TODO once we're on Django 1.2, write to self.stdout and self.stderr instead of plain print

        exclude = options.get('exclude', None) or []

        failed_instance_count = 0
        failed_model_count = 0
        for app in models.get_apps():
            for model in models.get_models(app):
                if model_name(model) in exclude:
                    print 'Skipping model %s' % model_name(model)
                    continue
                fail_count = self.check_model(app, model)
                if fail_count > 0:
                    failed_model_count += 1
                    failed_instance_count += fail_count
        print 'Detected %d errors in %d models' % (failed_instance_count, failed_model_count)

    def check_model(self, app, model):
        meta = model._meta
        if meta.proxy:
            print 'WARNING: proxy models not currently supported; ignored'
            return

        # Define all the checks we can do; they return True if they are ok,
        # False if not (and print a message to stdout)
        def check_foreign_key(model, field):
            foreign_model = field.related.parent_model
            def check_instance(instance):
                try:
                    # name: name of the attribute containing the model instance (e.g. 'user')
                    # attname: name of the attribute containing the id (e.g. 'user_id')
                    getattr(instance, field.name)
                    return True
                except ObjectDoesNotExist:
                    print '%s with pk %s refers via field %s to nonexistent %s with pk %s' % \
                        (model_name(model), str(instance.pk), field.name, model_name(foreign_model), getattr(instance, field.attname))
            return check_instance

        # Make a list of checks to run on each model instance
        checks = []
        for field in meta.local_fields + meta.local_many_to_many + meta.virtual_fields:
            if isinstance(field, models.ForeignKey):
                checks.append(check_foreign_key(model, field))

        # Run all checks
        fail_count = 0
        if checks:
            for instance in with_progress_meter(model.objects.all(), model.objects.count(), 'Checking model %s ...' % model_name(model)):
                for check in checks:
                    if not check(instance):
                        fail_count += 1
        return fail_count

I'm making this a community wiki because I welcome any and all improvements to my code!

like image 73
2 revs, 2 users 99% Avatar answered Sep 18 '22 01:09

2 revs, 2 users 99%


Thomas' answer is great but is now a bit out of date. I have updated it as a gist to support Django 1.8+.

like image 32
racitup Avatar answered Sep 22 '22 01:09

racitup