Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Import - Export: IntegrittyError when trying to insert duplicate record in field(s) with unique or unique_together constraints

Update

I have filed a feature request. The idea is to pass on the IntegrittyError produced by the database when unique or unique_together reject a record that already exists in the database.


I have the following model:

class Compositions(models.Model):
    composer_key = models.ForeignKey(
        Composer,
        )
    composition = models.CharField(
        max_length=383,
        )

    class Meta(object):
        unique_together = (('composer_key', 'composition'), )

Using django-import-export in the admin interface, without providing an id for each entry in the csv file, ... if one pair of the csv file already exists, the procedure will be interrupted with an integrity error

duplicate key value violates unique constraint "data_compositions_composer_key_id_12f91ce7dbac16bf_uniq"
DETAIL:  Key (composer_key_id, composition)=(2, Star Wars) already exists.

The CSV file is the following:

id  composer_key    composition
        1           Hot Stuff
        2           Star Wars

The idea was to use skip_row and implement it in the admin.

admin.py:

class CompositionsResource(resources.ModelResource):

    class Meta:
        model = Compositions
        skip_unchanged = True
        report_skipped = True


class CompositionsAdmin(ImportExportModelAdmin):
    resource_class = CompositionsResource


admin.site.register(Compositions, CompositionsAdmin)

This will not cure the problem, however, because skip_row expects an id in the csv file in order to check if each row is the same with the very specific database entry.

Considering that this control can be performed by the database when using unique(_together) would not it be effective to catch this error and then return skip_row = True or alternatively pass on this error?

like image 984
raratiru Avatar asked Jan 12 '16 12:01

raratiru


3 Answers

Only one Change is need. And you can use django-import-export

models.py

    class Compositions(models.Model):
        composer_key = models.ForeignKey(
            Composer,
            )
        composition = models.CharField(
            max_length=383,
            unique=False
            )
        date_created = models.DateTimeField(default=timezone.now)

        class Meta(object):
            unique_together = (('composer_key','composition'),)

override save_instance with try. And ignore error when fail. admin.py

        class CompositionsResource(resources.ModelResource):

            class Meta:
                model = Compositions
                skip_unchanged = True
                report_skipped = True

            def save_instance(self, instance, using_transactions=True, dry_run=False):
                try:
                    super(CompositionsResource, self).save_instance(instance, using_transactions, dry_run)
                except IntegrityError:
                    pass

        class CompositionsAdmin(ImportExportModelAdmin):
            resource_class = CompositionsResource

        admin.site.register(Compositions, CompositionsAdmin)

and import this

from django.db import IntegrityError
like image 71
blacker Avatar answered Nov 09 '22 01:11

blacker


A note on the accepted answer: it will give the desired result, but will slam the disk usage and time with large files.

A more efficient approach I've been using (after spending a lot of time going through the docs) is to override skip_row, and use a set of tuples as a unique constraint as part of the class. I still override save_instance as the other answer suggests to handle IntegrityErrors that get through, of course.

Python sets don't create duplicate entries, so they seem appropriate for this kind of unique index.

class CompositionsResource(resources.ModelResource):
  set_unique = set()

  class Meta:
    model = Composers
    skip_unchanged = True
    report_skipped = True

  def before_import(self, dataset, using_transactions, dry_run, **kwargs):
    # Clear out anything that may be there from a dry_run,
    #  such as the admin mixin preview
    self.set_unique = set()

  def skip_row(self, instance, original):
    composer_key = instance.composer_key  # Could also use composer_key_id
    composition = instance.composition
    tuple_unique = (composer_key, composition)

    if tuple_unique in self.set_unique:
      return true
    else:
      self.set_unique.add(tuple_unique)
    return super(CompositionsResource, self).skip_row(instance, original)

    # save_instance override should still go here to pass on IntegrityError

This approach will at least cut down on duplicates encountered within the same dataset. I used it to deal with multiple flat files that were ~60000 lines each, but had lots of repetitive/nested foreign keys. This made that initial data import way faster.

like image 44
Supra621 Avatar answered Nov 09 '22 02:11

Supra621


models.py:

class Compositions(models.Model):
    composer_key = models.ForeignKey(
        Composer,
        )
    composition = models.CharField(
        max_length=383,
        unique=False
        )
    date_created = models.DateTimeField(default=timezone.now)

    class Meta(object):
        unique_together = (('composer_key','composition'),)

This is a script I have written 'on the fly' for the above model in order to automatically discard duplicate entries. I have saved it to ./project_name/csv.py and import it from shell when I fill the relevant columns of the file duc.csv with data. The columns should not contain headers. Only data.

$./manage.py shell
>>> from project_name import csv

csv.py:

from data.models import Composer, Compositions
import csv
import sys, traceback
from django.utils import timezone

filename = '/path/to/duc.csv'

with open(filename, newline='') as csvfile:
    all_lines = csv.reader(csvfile, delimiter=',', quotechar='"')
    for each_line in all_lines:
        print (each_line)
        try:
            instance = Compositions(
                id=None,
                date_created=timezone.now(),
                composer_key=Composer.objects.get(id=each_line[2]),
                composition=each_line[3]
            )
            instance.save()
            print ("Saved composition: {0}".format(each_line[3]))
        except:  // exception type must be inserted here
            exc_type, exc_value, exc_traceback = sys.exc_info()  //debugging mostly
            print (exc_value)
like image 24
raratiru Avatar answered Nov 09 '22 01:11

raratiru