Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to do a data migration as just one operation in a larger Django migration?

Tags:

python

django

I am handling what I assume is a common issue: I've realized that an existing model field of the model Foo would be better as a completely seperate model Bar with a foreign key to Foo. So, we need to make a schema migration. But what's more, since there is already existing data in the model field of Foo, we need to make a data migration before we delete that field.

So we have identified that there are three distinct steps to take:

  1. Create the new table Bar
  2. Migrate the existing data in Foo to the new table Bar
  3. Delete the existing field in Foo

First, I make all the needed model changes in models.py, and then auto-generate a migration. Everything looks good, except we're going to lose all the data in the field, so I need to add one extra operation to handle the data migration (RunPython). I would end up with something like the following:

def do_data_migration(apps, schema_editor):
    # Migrate data from Foo to Bar

class Migration(migrations.Migration):

    dependencies = [ 
        ('exampleapp', 'migration_003'),
    ]   

    operations = [ 
        migrations.CreateModel(
            # Create the new model Bar
        ),  
        migrations.AddField(
            # Add the foreign key field to model Foo
        ),  
        migrations.RunPython(
            do_data_migration
        ),
        migrations.RemoveField(
            # Remove the old field from Foo
        ),  
    ]

Is it safe to run a data migration as one of several operations in a migration? My worries are that there is any sort of locking going on, or if perhaps the app registry that RunPython passes to do_data_migration won't be up to date with the preceding operations?

I am aware the I could create three migrations: one for CreateModel and AddField, the second for RunPython, and the last for RemoveField. The question is if it is functionally equivalent to do all four steps in a single migration (which provides the added benefit of making the entire migration easier to understand.)

like image 819
fildred13 Avatar asked Jun 24 '16 23:06

fildred13


People also ask

What is data migration in Django?

Subscribe to our YouTube Channel! Data Migration is a very convenient way to change the data in the database in conjunction with changes in the schema. They work like a regular schema migration. Django keep track of dependencies, order of execution and if the application already applied a given data migration or not.

What is a database migration?

Databases are data storage media where data is structured in an organized way. Databases are managed through database management systems (DBMS). Hence, database migration involves moving from one DBMS to another or upgrading from the current version of a DBMS to the latest version of the same DBMS.

Is it possible to remove all migrations in Python?

If you do not have any permanent databases, then yes, you can remove all migrations, run python manage.py makemigrations --initial and it will create fresh migrations based on your current models. Also, you should check if any of the migrations are custom data migrations written by hand. If there are any, you might want to keep those.

How long does it take to migrate data?

Prolonged migration time – Data migration can take a long time, from a few months to several years, and can be prolonged if the process encounters network blockages that can affect transmission times. Connection speeds and infrastructure limitations can also affect the progress of the migration.


1 Answers

With regards to Django itself, this is perfectly safe. Each operation will receive the correct state based on all previous migrations and operations within the same migration. Your RunPython operation will receive an app registry that includes the new Bar model and still has the old field on Foo.

What may not be safe is the database-side of the operation. If a database supports DDL (Data Definition Language) in transactions, Django will run the complete migration in a single transaction. PostgreSQL, for example, supports DDL in transactions, but does not allow you to mix schema changes and data changes in the same transaction. Attempting to do both within a single migration/transaction will result in an error.

If you use MySQL or Oracle, which do not support DDL transactions and will only run the RunPython operation in a transaction, you can safely put all operations in the same migration. However, you will lose out on some cross-database compatibility.

like image 189
knbk Avatar answered Oct 12 '22 11:10

knbk