Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient data migration on a large django table

I need to add a new column to a large (5m row) django table. I have a south schemamigration that creates the new column. Now I'm writing a datamigration script to populate the new column. It looks like this. (If you're not familiar with south migrations, just ignore the orm. prefixing the model name.)

print "Migrating %s articles." % orm.Article.objects.count()
cnt = 0
for article in orm.Article.objects.iterator():            
    if cnt % 500 == 0:
        print "    %s done so far" % cnt
    # article.newfield = calculate_newfield(article)
    article.save()
    cnt += 1

I switched from objects.all to objects.iterator to reduce memory requirements. But something is still chewing up vast memory when I run this script. Even with the actually useful line commented out as above, the script still grows to using 10+ GB of ram before getting very far through the table and I give up on it.

Seems like something is holding on to these objects in memory. How can I run this so it's not a memory hog?

FWIW, I'm using python 2.6, django 1.2.1, south 0.7.2, mysql 5.1.

like image 411
Leopd Avatar asked Jun 07 '11 21:06

Leopd


1 Answers

Ensure settings.DEBUG is set to False. DEBUG=True fills memory especially with database intensive operations, since it stores all queries sent to the RDBMS within a view.

With Django 1.8 out, it should not be necessary since a hardcoded max of 9000 queries are now stored, instead of an infinite number before.

like image 173
Steve K Avatar answered Nov 10 '22 15:11

Steve K