Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Duplicate rows

Tags:

sql

mysql

I have a table with some repeated information: Id, Name, Lastname, Birth, PersonalKey, Personal Info, Direction, Source.

Where source tells me where the information came from.

The repeated information has unique id, and I need to erase the duped information. But, I have priority over some Source information that i need to be the one that stays and the other erased.

Other thing is that another Source information have some information that the one that i want to stay doesnt have so i need to refill the PersonalKey to the one thats going to stay and erase the repeated ones.

Table named Pruebas

---Id, Name, Firstname, Lastname, Birth, RFC, Source, PersonalKey---
---2,Juan,Garcia,Escobeddo,1983-08-04,GAED87393, DRV484930, 34233--
---3,Juan,Garcia,Escobedo,1987-08-04,GAED87393, FIN484930, --
---4,Juan,Garcia,Escobedo,1987-08-04,GAED87393, SA484930, --

As you see:

  • The IDs are unique
  • The name, firstname and lastname are repeated
  • The id 2 has a PersonalKey value, but 3 and 4 don't
    • I want the one with the 'FIN%' source to stay and the other ones erased, but first I need to make sure the row that remains gets the PersonalKey value (IOW, I don't want to lose the PersonalKey value).

Thanks in advance.

like image 437
Granger Avatar asked Jan 24 '23 18:01

Granger


1 Answers

I would run a cursor (with MySQL SP programming language, Java, Python, .NET) on this query:

select Name, Firstname, Lastname, count(1)
  from Pruebas
 group by Name, Firstname, Lastname
having count(1) > 1

Then, on the returned rows from the cursor, just do whatever you need to: check for the FIN% instance, check for PersonalKey's presence, and update accordingly.

For each row on the cursor, you can open a different cursor with:

select *
  from Pruebas
 where Name = the_Name
   and Firstname = the_Firstname
   and Lastname = the_Lastname

And now, you will have a inner cursor with all the rows you will modify. If it is the one you need, keep it and update it with the KEY value you mentioned. Otherwise, delete it.

In Oracle, you could accomplish what you want in one query, but I don't think that way you'll get the same performance you would with this approach.

Hope it helps.

like image 159
Pablo Santa Cruz Avatar answered Feb 04 '23 05:02

Pablo Santa Cruz