Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django - Efficiently bulk create inherited models

Python 2.7.9 Django 1.7 MySQL 5.6

I would like to populate a whole bunch of object instances belonging to multiple classes, stack them up into a single create()-like query, open a database connection, execute the query, then close. My main motivation is performance, but code compactness is also a plus.

The functionality of bulk_create() appears to be exactly what I want, but I am in violation of at least one of the caveats listed here, i.e.

It does not work with many-to-many relationships.

and

It does not work with child models in a multi-table inheritance scenario.

These limitations are also described in the source code thus:

# So this case is fun. When you bulk insert you don't get the primary
# keys back (if it's an autoincrement), so you can't insert into the
# child tables which references this. There are two workarounds, 1)
# this could be implemented if you didn't have an autoincrement pk,
# and 2) you could do it by doing O(n) normal inserts into the parent
# tables to get the primary keys back, and then doing a single bulk
# insert into the childmost table. Some databases might allow doing
# this by using RETURNING clause for the insert query. We're punting
# on these for now because they are relatively rare cases.

But the error returned when I attempt it is the generic

ValueError: Can't bulk create an inherited model

My models do not apparently contain any many-to-many fields or foreign keys. It is not entirely clear to me what multi-table inheritance scenarios they are referring to, so I'm not sure if that is my problem. I was hoping I could slip by with my structure that looks like this but then I got the general error, so no dice:

child class with OneToOneField---\
                                  \   
child class with OneToOneField----->---concrete parent class
                                  /
child class with OneToOneField---/

As far as the workarounds suggested in the source, #1 is not an option for me, and #2 does not look appealing because I assume it would entail sacrificing the gains in performance that I'm going for.

Are there other workarounds that could simulate bulk_create() while handling inheritance like this and not forgo the gains in performance? Do I need to go back down to raw SQL? I would not mind making a separate collection and executing a separate INSERT/create() for each child object type.

like image 516
WAF Avatar asked Dec 30 '14 19:12

WAF


1 Answers

The workaround I settled on was wrapping all of my collected create()s in a with transaction.atomic():. This greatly reduced running time by not opening any database connections or executing any queries until all of the Python had returned.

A downside could be that if any errors at all are encountered all changes are rolled back and the database is untouched. This could be remedied by chunking the create()s into batches and opening and closing a transaction around each one. (In my case this was not the desired behavior because I wanted all of the data or none of it.)

like image 163
WAF Avatar answered Nov 14 '22 02:11

WAF