Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete Duplicate Rows in Django DB

Tags:

I have a model where because of a code bug, there are duplicate rows. I now need to delete any duplicates from the database.

Every row should have a unique photo_id. Is there a simple way to remove them? Or do I need to do something like this:

rows = MyModel.objects.all() for row in rows:     try:         MyModel.objects.get(photo_id=row.photo_id)     except:         row.delete() 
like image 677
Brenden Avatar asked Jan 22 '12 22:01

Brenden


People also ask

How to delete duplicate Rows in Django?

Use . reverse() to delete the duplicates first and keep the first instance of it, rather than the last. As always, back up before you do this stuff.

How do you avoid insert duplicates in ORM in Django?

Use the get_or_create() Method in Django When we create duplicate objects multiple times, this method helps us avoid creating them multiple times.


2 Answers

The simplest way is the simplest way! Especially for one off scripts where performance doesn't even matter (unless it does). Since it's not core code, I'd just write the first thing that comes to mind and works.

# assuming which duplicate is removed doesn't matter... for row in MyModel.objects.all().reverse():     if MyModel.objects.filter(photo_id=row.photo_id).count() > 1:         row.delete() 

Use .reverse() to delete the duplicates first and keep the first instance of it, rather than the last.

As always, back up before you do this stuff.

like image 106
Yuji 'Tomita' Tomita Avatar answered Oct 16 '22 15:10

Yuji 'Tomita' Tomita


This may be faster because it avoids the inner filter for each row in MyModel.

Since the ids are unique, if the models are sorted by them in increasing order, we can keep track of the last id we saw and as we walk over the rows if we see a model with the same id, it must be a duplicate, so we can delete it.

lastSeenId = float('-Inf') rows = MyModel.objects.all().order_by('photo_id')  for row in rows:   if row.photo_id == lastSeenId:     row.delete() # We've seen this id in a previous row   else: # New id found, save it and check future rows for duplicates.     lastSeenId = row.photo_id  
like image 25
wolfes Avatar answered Oct 16 '22 13:10

wolfes