Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: Bulk operations

Business:
I encountered a problem - when operating with large datasets with Django ORM, canonical way is manipulate with every single element. But of course this way is very inefficient. So I decided to use raw SQL.

Substance:
I have a basic code which forms SQL query, which updates rows of table, and commiting it:

from myapp import Model
from django.db import connection, transaction
COUNT = Model.objects.count()
MYDATA = produce_some_differentiated_data() #Creating individual value for each row
cursor = connection.cursor()
str = []
for i in xrange(1, COUNT):
    str.append("UPDATE database.table\n"
               "SET field_to_modify={}\n"
               "WHERE primary_key_field={};\n".format(MYDATA, i))


str = ''.join(str)
cursor.execute(str)
transaction.commit_unless_managed() #This cause exception

And on last statement I get this, even when SIZE is small:

_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")

Maybe Django do not allow execute multiple SQL queries at once?

ps Closing cursor before commiting helps to avoid exception, but is this correct?

My expectations:
Im looking for every possible solid solution for bulk operations (preferably inside Django). I dont care about will it be ORM or raw SQL, I would have stand with code I pasted above, if I could avoid error. In case of no solutions it will be good at least, just for curiosity, to know reason of this exception.

What I have learned besides answers:
In Django 1.4 was introduced bulk_create, for efficient multiple INSERT operations

like image 661
Gill Bates Avatar asked Dec 26 '12 01:12

Gill Bates


People also ask

How do I update Django bulk records?

The best solutions I found are: a) use @transaction. atomic decorator, which improves performance by using a single transaction, or b) make a bulk insert in a temporary table, and then an UPDATE from the temporary table to the original one.

How does Django store bulk data?

Single or multiple records can be inserted into the database tables by writing a script. bulk_create() method is one of the ways to insert multiple records in the database table. How the bulk_create() method is used to insert the multiple data in a Django database table will be shown in this tutorial.

Why are QuerySets considered lazy?

This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.


1 Answers

Django 1.4+ has a pretty decent support for bulk operations in it's ORM and you should see if you can use that - it's most portable way and pretty nice to work with too.

It allows not only updating the same value for the field in all objects (that's trivial), but also to update field values based on other fields as well as perform some limited calculations. I am not sure if it fits your need (depends how "produce_some_differentiated_data" works) - some of calculations you could do, some of them probably not. Some example:

image_id_list = [1,5,6]
Image.objects.filter(image_id__in=image_id_list).
     update(views_number=F('views_number') + 1)

The above example will convert into SQL similar to:

UPDATE image SET views_number = views_number + 1 WHERE image_id IN (1,5,6);

Which is fastest way of doing bulk update - way faster than running multiple queries. Running multiple queries in single SQL statement is not really improving the speed of operation. What does improve it is to make a single query like the above that is operating on many rows at the same time. You can build fairly complex formulas in the update statement so the best if your "produce_some_differentiated_data" method can be expressed this way. Even if it cannot be done directly, you can probably make some modification to the model and add some extra fields to make that happen. That might pay off if such bulk operations are executed often.

From Django's documentation:

Django supports the use of addition, subtraction, multiplication, division and modulo arithmetic with F() objects, both with constants and with other F() objects.

More about it here: https://docs.djangoproject.com/en/dev/topics/db/queries/#updating-multiple-objects-at-once

like image 150
Jarek Potiuk Avatar answered Oct 13 '22 13:10

Jarek Potiuk