Update
I have filed a feature request. The idea is to pass
on the IntegrittyError
produced by the database when unique
or unique_together
reject a record that already exists in the database.
I have the following model:
class Compositions(models.Model):
composer_key = models.ForeignKey(
Composer,
)
composition = models.CharField(
max_length=383,
)
class Meta(object):
unique_together = (('composer_key', 'composition'), )
Using django-import-export in the admin interface, without providing an id
for each entry in the csv file, ... if one pair of the csv file already exists, the procedure will be interrupted with an integrity error
duplicate key value violates unique constraint "data_compositions_composer_key_id_12f91ce7dbac16bf_uniq"
DETAIL: Key (composer_key_id, composition)=(2, Star Wars) already exists.
The CSV file is the following:
id composer_key composition
1 Hot Stuff
2 Star Wars
The idea was to use skip_row
and implement it in the admin.
admin.py:
class CompositionsResource(resources.ModelResource):
class Meta:
model = Compositions
skip_unchanged = True
report_skipped = True
class CompositionsAdmin(ImportExportModelAdmin):
resource_class = CompositionsResource
admin.site.register(Compositions, CompositionsAdmin)
This will not cure the problem, however, because skip_row
expects an id
in the csv file in order to check if each row is the same with the very specific database entry.
Considering that this control can be performed by the database when using unique
(_together
) would not it be effective to catch this error and then return skip_row = True
or alternatively pass
on this error?
Only one Change is need. And you can use django-import-export
models.py
class Compositions(models.Model):
composer_key = models.ForeignKey(
Composer,
)
composition = models.CharField(
max_length=383,
unique=False
)
date_created = models.DateTimeField(default=timezone.now)
class Meta(object):
unique_together = (('composer_key','composition'),)
override save_instance with try. And ignore error when fail. admin.py
class CompositionsResource(resources.ModelResource):
class Meta:
model = Compositions
skip_unchanged = True
report_skipped = True
def save_instance(self, instance, using_transactions=True, dry_run=False):
try:
super(CompositionsResource, self).save_instance(instance, using_transactions, dry_run)
except IntegrityError:
pass
class CompositionsAdmin(ImportExportModelAdmin):
resource_class = CompositionsResource
admin.site.register(Compositions, CompositionsAdmin)
and import this
from django.db import IntegrityError
A note on the accepted answer: it will give the desired result, but will slam the disk usage and time with large files.
A more efficient approach I've been using (after spending a lot of time going through the docs) is to override skip_row
, and use a set of tuples as a unique constraint as part of the class. I still override save_instance
as the other answer suggests to handle IntegrityErrors that get through, of course.
Python sets
don't create duplicate entries, so they seem appropriate for this kind of unique index.
class CompositionsResource(resources.ModelResource):
set_unique = set()
class Meta:
model = Composers
skip_unchanged = True
report_skipped = True
def before_import(self, dataset, using_transactions, dry_run, **kwargs):
# Clear out anything that may be there from a dry_run,
# such as the admin mixin preview
self.set_unique = set()
def skip_row(self, instance, original):
composer_key = instance.composer_key # Could also use composer_key_id
composition = instance.composition
tuple_unique = (composer_key, composition)
if tuple_unique in self.set_unique:
return true
else:
self.set_unique.add(tuple_unique)
return super(CompositionsResource, self).skip_row(instance, original)
# save_instance override should still go here to pass on IntegrityError
This approach will at least cut down on duplicates encountered within the same dataset. I used it to deal with multiple flat files that were ~60000 lines each, but had lots of repetitive/nested foreign keys. This made that initial data import way faster.
models.py:
class Compositions(models.Model):
composer_key = models.ForeignKey(
Composer,
)
composition = models.CharField(
max_length=383,
unique=False
)
date_created = models.DateTimeField(default=timezone.now)
class Meta(object):
unique_together = (('composer_key','composition'),)
This is a script I have written 'on the fly' for the above model in order to automatically discard duplicate entries. I have saved it to ./project_name/csv.py
and import it from shell when I fill the relevant columns of the file duc.csv
with data. The columns should not contain headers. Only data.
$./manage.py shell
>>> from project_name import csv
csv.py:
from data.models import Composer, Compositions
import csv
import sys, traceback
from django.utils import timezone
filename = '/path/to/duc.csv'
with open(filename, newline='') as csvfile:
all_lines = csv.reader(csvfile, delimiter=',', quotechar='"')
for each_line in all_lines:
print (each_line)
try:
instance = Compositions(
id=None,
date_created=timezone.now(),
composer_key=Composer.objects.get(id=each_line[2]),
composition=each_line[3]
)
instance.save()
print ("Saved composition: {0}".format(each_line[3]))
except: // exception type must be inserted here
exc_type, exc_value, exc_traceback = sys.exc_info() //debugging mostly
print (exc_value)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With