Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make Django QuerySet bulk delete() more efficient

Tags:

python

orm

django

Setup:
Django 1.1.2, MySQL 5.1

Problem:

Blob.objects.filter(foo = foo) \             .filter(status = Blob.PLEASE_DELETE) \             .delete() 

This snippet results in the ORM first generating a SELECT * from xxx_blob where ... query, then doing a DELETE from xxx_blob where id in (BLAH); where BLAH is a ridiculously long list of id's. Since I'm deleting a large amount of blobs, this makes both me and the DB very unhappy.

Is there a reason for this? I don't see why the ORM can't convert the above snippet into a single DELETE query. Is there a way to optimize this without resorting to raw SQL?

like image 458
svintus Avatar asked Feb 01 '11 20:02

svintus


People also ask

Is Django QuerySet lazy?

This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.

How does Django handle large data?

Use bulk query. Use bulk queries to efficiently query large data sets and reduce the number of database requests. Django ORM can perform several inserts or update operations in a single SQL query. If you're planning on inserting more than 5000 objects, specify batch_size.

How do I delete a QuerySet in Django?

The django querysets where we use filter() function to select multiple rows and use delete() function to delete them is also known as bulk deletion.


2 Answers

For those who are still looking for an efficient way to bulk delete in django, here's a possible solution:

The reason delete() may be so slow is twofold: 1) Django has to ensure cascade deleting functions properly, thus looking for foreign key references to your models; 2) Django has to handle pre and post-save signals for your models.

If you know your models don't have cascade deleting or signals to be handled, you can accelerate this process by resorting to the private API _raw_delete as follows:

queryset._raw_delete(queryset.db) 

More details in here. Please note that Django already tries to make a good handling of these events, though using the raw delete is, in many situations, much more efficient.

like image 158
Anoyz Avatar answered Oct 08 '22 07:10

Anoyz


Not without writing your own custom SQL or managers or something; they are apparently working on it though.

http://code.djangoproject.com/ticket/9519

like image 22
Dominic Santos Avatar answered Oct 08 '22 06:10

Dominic Santos