Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update multiple objects at once in Django?

Tags:

python

django

I am using Django 1.9. I have a Django table that represents the value of a particular measure, by organisation by month, with raw values and percentiles:

class MeasureValue(models.Model):     org = models.ForeignKey(Org, null=True, blank=True)     month = models.DateField()     calc_value = models.FloatField(null=True, blank=True)     percentile = models.FloatField(null=True, blank=True) 

There are typically 10,000 or so per month. My question is about whether I can speed up the process of setting values on the models.

Currently, I calculate percentiles by retrieving all the measurevalues for a month using a Django filter query, converting it to a pandas dataframe, and then using scipy's rankdata to set ranks and percentiles. I do this because pandas and rankdata are efficient, able to ignore null values, and able to handle repeated values in the way that I want, so I'm happy with this method:

records = MeasureValue.objects.filter(month=month).values() df = pd.DataFrame.from_records(records) // use calc_value to set percentile on each row, using scipy's rankdata 

However, I then need to retrieve each percentile value from the dataframe, and set it back onto the model instances. Right now I do this by iterating over the dataframe's rows, and updating each instance:

for i, row in df.iterrows():     mv = MeasureValue.objects.get(org=row.org, month=month)     if (row.percentile is None) or np.isnan(row.percentile):         row.percentile = None     mv.percentile = row.percentile     mv.save() 

This is unsurprisingly quite slow. Is there any efficient Django way to speed it up, by making a single database write rather than tens of thousands? I have checked the documentation, but can't see one.

like image 400
Richard Avatar asked Apr 20 '16 17:04

Richard


1 Answers

Atomic transactions can reduce the time spent in the loop:

from django.db import transaction  with transaction.atomic():     for i, row in df.iterrows():         mv = MeasureValue.objects.get(org=row.org, month=month)          if (row.percentile is None) or np.isnan(row.percentile):              # if it's already None, why set it to None?             row.percentile = None          mv.percentile = row.percentile         mv.save() 

Django’s default behavior is to run in autocommit mode. Each query is immediately committed to the database, unless a transaction is actives.

By using with transaction.atomic() all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.

like image 183
ahmed Avatar answered Oct 03 '22 17:10

ahmed