Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice when add a new unique field to an existing django model

I have an existing model that looks somewhat like the following...

class Resource(models.Model):

    id = models.AutoField(primary_key=True)

We have been using this for some time, and now have ~1M instances of these Resource objects (and associated ForeignKey/else usages) in our database.

I now have a need to track another ID on this model, one that I want to enforce is unique.

other_id = models.IntegerField(unique=True)

This other_id information is currently stored in some external CSVs, and I want to (at some point in the process) load this information in to all existing Resource instances.

After adding the above field, Django's makemigrations works just fine. However when I go to apply said migration against an existing database I get an error indicating that I need to provide a default to use for all existing Resource instances. I'm sure many of you have seen something similar.

What is the best approach to getting around this limitation? Some methods I have thought of...

    • Remove the unique=True requirement
    • apply the migration
    • externally load in the other_id value to all existing models (through some management command, or 1-off script)
    • add the unique=True back in and apply the migration
    • Dump all existing data to JSON
    • flush all tables
    • apply the migration (with unique=True)
    • write a script that loads the data back in, adding the correct other_id value
  1. (unsure if this is possible) - Write some custom migration logic to automatically reference these external CSVs to load other_id values when I run manage.py migrate. This could hit issues if (at some point in the future) someone re-runs these migrations and this part fails (cannot find existing resource id in the CSVs to pull out other_id).

All of these feel complicated, but then again I guess what I am trying to do isn't the simplest thing either.

Any ideas? I have to imagine someone has had to work around a similar issue in the past.

Thanks!

like image 541
wakey Avatar asked Jul 08 '19 12:07

wakey


People also ask

How do I add a field to an existing model in Django?

Please select a fix: 1) Provide a one-off default now (will be set on all existing rows with a null value for this column) 2) Quit, and let me add a default in models.py Select an option: Then go to the Django project root folder in a terminal and run the command makemigrations. It will generate a new migration .

How do I update a specific field in Django?

Use update_fields in save() If you would like to explicitly mention only those columns that you want to be updated, you can do so using the update_fields parameter while calling the save() method. You can also choose to update multiple columns by passing more field names in the update_fields list.


2 Answers

Actually, the source or your issue is not the unique constraint by itself but the fact that your field doesn't allow nulls and has no default value - you'd have the very same error with a non-unique field.

The proper solution here is to allow the field to be null (null=True) and default it to None (which will translate to sql "null"). Since null values are excluded from unique constraints (at least if your db vendor respects SQL standard), this allow you to apply the schema change while still making sure you cannot have a duplicate for non-null values.

Then you may want a data migration to load the known "other_id" values, and eventually a third schema migration to disallow null values for this field - if and only if you know you have filled this field for all records.

like image 131
bruno desthuilliers Avatar answered Nov 03 '22 05:11

bruno desthuilliers


Django has something called Data Migrations where you create a migration file that modifies/remove/add data to your database as you apply your migrations.

In this case you would create 3 different migrations:

  1. Create a migration that allow null values with null=True.
  2. Create a data migration that populate the data.
  3. Create a migration that disallow null values by removing the null=True added in step 1.

As you then run python manage.py migrate it would apply all of the migrations in step 1-3 in the correct order.

Your data migration would look something like this:

from django.db import migrations

def populate_reference(apps, schema_editor):
    MyModel = apps.get_model('yourappname', 'MyModel')
    for obj in MyModel.objects.all():
        obj.other_id = random_id_generator()
        obj.save()

class Migration(migrations.Migration):

    dependencies = [
        ('yourappname', '0001_initial'),
    ]

    operations = [
        migrations.RunPython(populate_reference),
    ]

You can create an empty migration file using the ./manage.py makemigrations --empty yourappname command.

like image 32
Marcus Lind Avatar answered Nov 03 '22 05:11

Marcus Lind