Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I perform a batch insert in Django?

Tags:

sql

django

In mysql, you can insert multiple rows to a table in one query for n > 0:

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9), ..., (n-2, n-1, n);

Is there a way to achieve the above with Django queryset methods? Here's an example:

values = [(1, 2, 3), (4, 5, 6), ...]

for value in values:
    SomeModel.objects.create(first=value[0], second=value[1], third=value[2])

I believe the above is calling an insert query for each iteration of the for loop. I'm looking for a single query, is that possible in Django?

like image 520
Thierry Lam Avatar asked Apr 16 '10 19:04

Thierry Lam


People also ask

What is bulk create Django?

bulk_create() method is one of the ways to insert multiple records in the database table. How the bulk_create() method is used to insert the multiple data in a Django database table will be shown in this tutorial.

How do I save multiple items in Django?

To create multiple records based on a Django model you can use the built-in bulk_create() method. The advantage of the bulk_create() method is that it creates all entries in a single query, so it's very efficient if you have a list of a dozen or a hundred entries you wish to create.

What does Django bulk create return?

According to django, it creates a list of database records in one shot, but the objects' ids are not retrieved. I think it's good for the situation where you do large insertions without further processing the data.

Is bulk create faster in Django?

Luckily, Django provides a better mechanism for inserting a large number of models into the database: bulk_create. bulk_create is a significant improvement over saving models one at a time and we used it for a while at Cedar. Unfortunately, it wasn't fast enough when we loaded more than 10,000 models.

How do I manage database transactions in a Django project?

Django gives you a few ways to control how database transactions are managed. Django’s default behavior is to run in autocommit mode. Each query is immediately committed to the database, unless a transaction is active.

What is the use of bulk_create () in Django?

Today, we use Django's Model.objects.bulk_create () regularly to help speed up operations that insert a lot of data into a database. One of those projects involves processing a spreadsheet with multiple tabs, each of which might contain thousands or even tens of thousands of records, some of which might correspond to multiple model classes.

Is it possible to create multiple objects in one Django model?

No it is not possible because django models are objects rather than a table. so table actions are not applicable to django models. and django creates an object then inserts data in to the table therefore you can't create multiple object in one time. Considering the answers above that actually work, saying it is not possible seems nuts.

What is JDBC batch insert?

The following article provides an outline for JDBC Batch Insert. Java database connectivity, which is JDBC, provides the different types of functionalities to the user, in which those batches insert is one of the functionalities that are provided by the JDBC.


4 Answers

These answers are outdated. bulk_create has been brought in Django 1.4:

https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create

like image 194
Fred Avatar answered Sep 23 '22 03:09

Fred


I recently looked for such a thing myself (inspired by QuerySet.update(), as I imagine you are too). To my knowledge, no bulk create exists in the current production framework (1.1.1 as of today). We ended up creating a custom manager for the model that needed bulk-create, and created a function on that manager to build an appropriate SQL statement with the sequence of VALUES parameters.

Something like (apologies if this does not work... hopefully I've adapted this runnably from our code):

from django.db import models, connection

class MyManager(models.Manager):

    def create_in_bulk(self, values):
        base_sql = "INSERT INTO tbl_name (a,b,c) VALUES "
        values_sql = []
        values_data = []

        for value_list in values:
            placeholders = ['%s' for i in range(len(value_list))]
            values_sql.append("(%s)" % ','.join(placeholders))
            values_data.extend(value_list)

        sql = '%s%s' % (base_sql, ', '.join(values_sql))

        curs = connection.cursor()
        curs.execute(sql, values_data)

class MyObject(models.Model):
    # model definition as usual... assume:
    foo = models.CharField(max_length=128)

    # custom manager
    objects = MyManager()

MyObject.objects.create_in_bulk( [('hello',), ('bye',), ('c', )] )

This approach does run the risk of being very specific to a particular database. In our case, we wanted the function to return the IDs just created, so we had a postgres-specific query in the function to generate the requisite number of IDs from the primary key sequence for the table that represents the object. That said, it does perform significantly better in tests versus iterating over the data and issuing separate QuerySet.create() statements.

like image 24
Jarret Hardie Avatar answered Sep 22 '22 03:09

Jarret Hardie


Here is way to do batch inserts that still goes through Django's ORM (and thus retains the many benefits the ORM provides). This approach involves subclassing the InsertQuery class as well as creating a custom manager that prepares model instances for insertion into the database in much the same way that Django's save() method uses. Most of the code for the BatchInsertQuery class below is straight from the InsertQuery class, with just a few key lines added or modified. To use the batch_insert method, pass in a set of model instances that you want to insert into the database. This approach frees up the code in your views from having to worry about translating model instances into valid SQL values; the manager class in conjunction with the BatchInsertQuery class handles that.

from django.db import models, connection
from django.db.models.sql import InsertQuery

class BatchInsertQuery( InsertQuery ):

    ####################################################################

    def as_sql(self):
        """
        Constructs a SQL statement for inserting all of the model instances
        into the database.

        Differences from base class method:        

        - The VALUES clause is constructed differently to account for the
        grouping of the values (actually, placeholders) into
        parenthetically-enclosed groups. I.e., VALUES (a,b,c),(d,e,f)
        """
        qn = self.connection.ops.quote_name
        opts = self.model._meta
        result = ['INSERT INTO %s' % qn(opts.db_table)]
        result.append('(%s)' % ', '.join([qn(c) for c in self.columns]))
        result.append( 'VALUES %s' % ', '.join( '(%s)' % ', '.join( 
            values_group ) for values_group in self.values ) ) # This line is different
        params = self.params
        if self.return_id and self.connection.features.can_return_id_from_insert:
            col = "%s.%s" % (qn(opts.db_table), qn(opts.pk.column))
            r_fmt, r_params = self.connection.ops.return_insert_id()
            result.append(r_fmt % col)
            params = params + r_params
        return ' '.join(result), params

    ####################################################################

    def insert_values( self, insert_values ):
        """
        Adds the insert values to the instance. Can be called multiple times
        for multiple instances of the same model class.

        Differences from base class method:

        -Clears self.columns so that self.columns won't be duplicated for each
        set of inserted_values.        
        -appends the insert_values to self.values instead of extends so that
        the values (actually the placeholders) remain grouped separately for
        the VALUES clause of the SQL statement. I.e., VALUES (a,b,c),(d,e,f)
        -Removes inapplicable code
        """
        self.columns = [] # This line is new

        placeholders, values = [], []
        for field, val in insert_values:
            placeholders.append('%s')

            self.columns.append(field.column)
            values.append(val)

        self.params += tuple(values)
        self.values.append( placeholders ) # This line is different

########################################################################

class ManagerEx( models.Manager ):
    """
    Extended model manager class.
    """
    def batch_insert( self, *instances ):
        """
        Issues a batch INSERT using the specified model instances.
        """
        cls = instances[0].__class__
        query = BatchInsertQuery( cls, connection )
        for instance in instances:

             values = [ (f, f.get_db_prep_save( f.pre_save( instance, True ) ) ) \
                 for f in cls._meta.local_fields ]
            query.insert_values( values )

        return query.execute_sql()

########################################################################

class MyModel( models.Model ):
    myfield = models.CharField(max_length=255)
    objects = ManagerEx()

########################################################################

# USAGE:
object1 = MyModel(myfield="foo")
object2 = MyModel(myfield="bar") 
object3 = MyModel(myfield="bam")
MyModels.objects.batch_insert(object1,object2,object3)
like image 35
Brian from QuantRocket Avatar answered Sep 24 '22 03:09

Brian from QuantRocket


You might get the performance you need by doing manual transactions. What this will allow you to do is to create all the inserts in one transaction, then commit the transaction all at once. Hopefully this will help you: http://docs.djangoproject.com/en/dev/topics/db/transactions/

like image 33
Justin Abrahms Avatar answered Sep 22 '22 03:09

Justin Abrahms