I need to add a new column to a large (5m row) django table.  I have a south schemamigration that creates the new column.  Now I'm writing a datamigration script to populate the new column.  It looks like this. (If you're not familiar with south migrations, just ignore the orm. prefixing the model name.)
print "Migrating %s articles." % orm.Article.objects.count()
cnt = 0
for article in orm.Article.objects.iterator():            
    if cnt % 500 == 0:
        print "    %s done so far" % cnt
    # article.newfield = calculate_newfield(article)
    article.save()
    cnt += 1
I switched from objects.all to objects.iterator to reduce memory requirements.  But something is still chewing up vast memory when I run this script.  Even with the actually useful line commented out as above, the script still grows to using 10+ GB of ram before getting very far through the table and I give up on it.
Seems like something is holding on to these objects in memory. How can I run this so it's not a memory hog?
FWIW, I'm using python 2.6, django 1.2.1, south 0.7.2, mysql 5.1.
Ensure settings.DEBUG is set to False. DEBUG=True fills memory especially with database intensive operations, since it stores all queries sent to the RDBMS within a view.
With Django 1.8 out, it should not be necessary since a hardcoded max of 9000 queries are now stored, instead of an infinite number before.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With