I am trying to import data from multiple API's and there is a chance of duplicates. I am trying to bulk-create without duplication. All API sources do not provide me with a unique identifier. I am wondering what the best way to handle this situation is. What I have tried is as follows:
if 'rounds' in response:
print('=== Syncing Rounds ===')
rounds = response.get('rounds')
objs = [
Round(
name = item.get('name'),
season = Season.objects.get(id = item.get('seasonId')),
competition = Competition.objects.get(id = item.get('competitionId')),
round_number = item.get('roundNumber'),
)
for item in rounds
]
Round.objects.bulk_create(
objs,update_conflicts=True,
update_fields=['name','season','competition','round_number'],
unique_fields=['id'])
I tried setting ignore_conflicts = True but that approach didn't help me.
The round numbers range from 1-30 and the season is the year. In the given situation, I cannot make one field unique such as round number, season, or competition. It has to look for all three. For example There can be only one row for Round 1, 2023, for competition 112. This entire combination is unique.
Goal
The end goal is to either ensure no duplicate entries or update existing rows.
One hack (as said by OP) solution is Bulk insert on multi-column unique constraint Django
---Update---
Round Model
class Round (models.Model):
name = models.CharField(max_length=100)
round_number = models.SmallIntegerField(null=True)
season = models.ForeignKey(Season,on_delete=models.CASCADE)
competition = models.ForeignKey(Competition,on_delete=models.CASCADE)
start = models.DateTimeField(null=True,blank=True)
end = models.DateTimeField(null=True,blank=True)
tries = models.SmallIntegerField(default=0)
points = models.SmallIntegerField(default=0)
class Meta:
constraints = [
models.UniqueConstraint(
fields=['round_number','season','competition'],
name='unique_round')
I have tried using constraints but no dice
Your unique_field can not be id, since that is one that is not determined by the object. The unique_field decides for which fields there should at least be one value that is different in order to update. In case the season, competition_id, and round_number are the same, we can for example update the name.
Your view also is not very efficient. Yes, the .bulk_create(…) [Django-doc] will save a lot of insert queries, but the main bottleneck is retrieving all competitions, etc. That is not necessary. If we know for sure the objects exist, we can work with:
if 'rounds' in response:
objs = [
Round(
name=item['name'],
season_id=item['seasonId'],
competition_id=item['competitionId'],
round_number=item['roundNumber'],
)
for item in response['rounds']
]
Round.objects.bulk_create(
objs,
update_conflicts=True,
update_fields=['name'],
unique_fields=['season_id', 'competition_id', 'round_number'],
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With