I am using Django 1.9. I have a Django table that represents the value of a particular measure, by organisation by month, with raw values and percentiles:
class MeasureValue(models.Model): org = models.ForeignKey(Org, null=True, blank=True) month = models.DateField() calc_value = models.FloatField(null=True, blank=True) percentile = models.FloatField(null=True, blank=True)
There are typically 10,000 or so per month. My question is about whether I can speed up the process of setting values on the models.
Currently, I calculate percentiles by retrieving all the measurevalues for a month using a Django filter query, converting it to a pandas dataframe, and then using scipy's rankdata
to set ranks and percentiles. I do this because pandas and rankdata
are efficient, able to ignore null values, and able to handle repeated values in the way that I want, so I'm happy with this method:
records = MeasureValue.objects.filter(month=month).values() df = pd.DataFrame.from_records(records) // use calc_value to set percentile on each row, using scipy's rankdata
However, I then need to retrieve each percentile value from the dataframe, and set it back onto the model instances. Right now I do this by iterating over the dataframe's rows, and updating each instance:
for i, row in df.iterrows(): mv = MeasureValue.objects.get(org=row.org, month=month) if (row.percentile is None) or np.isnan(row.percentile): row.percentile = None mv.percentile = row.percentile mv.save()
This is unsurprisingly quite slow. Is there any efficient Django way to speed it up, by making a single database write rather than tens of thousands? I have checked the documentation, but can't see one.
Atomic transactions can reduce the time spent in the loop:
from django.db import transaction with transaction.atomic(): for i, row in df.iterrows(): mv = MeasureValue.objects.get(org=row.org, month=month) if (row.percentile is None) or np.isnan(row.percentile): # if it's already None, why set it to None? row.percentile = None mv.percentile = row.percentile mv.save()
Django’s default behavior is to run in autocommit mode. Each query is immediately committed to the database, unless a transaction is actives.
By using with transaction.atomic()
all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With