Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to iterate django objects updating them?

So I have a queryset to update

stories = Story.objects.filter(introtext="")
for story in stories:
    #just set it to the first 'sentence'
    story.introtext = story.content[0:(story.content.find('.'))] + ".</p>" 
    story.save()

And the save() operation completely kills performance. And in the process list, there are multiple entries for "./manage.py shell" yes I ran this through django shell.

However, in the past I've ran scripts that didn't need to use save(), as it was changing a many to many field. These scripts were very performant. My project has this code, which could be relevant to why these scripts were so good.

@receiver(signals.m2m_changed, sender=Story.tags.through)
def save_story(sender, instance, action, reverse, model, pk_set, **kwargs):
    instance.save()

What is the best way to update a large queryset (10000+) efficiently?

like image 735
straykiwi Avatar asked Mar 18 '23 07:03

straykiwi


1 Answers

As far as new introtext value depends on content field of the object you can't do any bulk update. But you can speed up saving list of individual objects by wrapping it into transaction:

from django.db import transaction

with  transaction.atomic():
    stories = Story.objects.filter(introtext='')
    for story in stories:
        introtext = story.content[0:(story.content.find('.'))] + ".</p>" 
        Story.objects.filter(pk=story.pk).update(introtext=introtext)

transaction.atomic() will increase speed by order of magnitude.

filter(pk=story.pk).update() trick allows you to prevent any pre_save/post_save signals which are emitted in case of the simple save(). This is the officially recommended method of updating single field of the object.

like image 130
catavaran Avatar answered Apr 06 '23 03:04

catavaran