Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can I delete duplicates in SQLite?

I have a SQLite DB where the statement:

SELECT messdatum, count(*) as anzahl 
from lipo 
GROUP BY Messdatum 
ORDER BY anzahl desc;

results in some lines, which indicates that I have some duplicates with the same Messdatum. How can I delete the duplicates only form my sqlite db? (it should delete anzahl-1 records where the messdatum is the same?) Has anyone an advice?

PS: I found this link How to remove duplicate from Microsoft but have problems with sqlite dialect. I got some errors due to the sqlite syntax. So f.e. I could do:

 INSERT into holdkey SELECT messdatum, count(*) as anzahl from lipo group by messdatum having count(*) > 1;

 INSERT into holddups SELECT DISTINCT lipo.* from lipo, holdkey where lipo.Messdatum = holdkey.messdatum ;

 DELETE lipo from lipo, holdkey where lipo.messdatum = holdkey.messdatum;

here is an error at the delete command. How can I do that? I tried to update the holdkey.anzahl to an additional col in lipo with

UPDATE lipo,holdkey set lipo.duplettenzahl = holdkey.anzahl WHERE lipo.messdatum = holdkey.messdatum ; 

but this is also not possible. If I would have the anzahl as dublettenzahl in lipo I could delete all records from lipo where dublettenzahl > 0. Please help! Thanks

like image 686
Walter Schrabmair Avatar asked Sep 17 '14 06:09

Walter Schrabmair


People also ask

How do I remove duplicates in dataset?

Pandas duplicated() and drop_duplicates() are two quick and convenient methods to find and remove duplicates. It is important to know them as we often need to use them during the data preprocessing and analysis.


2 Answers

SQLite has a special column, ROWID created on every table by default (you can switch it off using the WITHOUT ROWID modifier, but be very sure before doing so).

This means that we can identify specific rows within sets of duplicates, for example, finding the first entry for a value:

SELECT messdatum, MIN(ROWID) FROM lipo

So one way to remove duplicates might be this:

DELETE FROM lipo
WHERE rowid NOT IN (
  SELECT MIN(rowid) 
  FROM lipo 
  GROUP BY messdatum
)
like image 62
Mike Woodhouse Avatar answered Sep 29 '22 23:09

Mike Woodhouse


I got the solution:

 INSERT into holdkey SELECT messdatum, count(*) as anzahl,NameISO from lipo group by messdatum having count(*) > 1;
 INSERT into holddups SELECT DISTINCT lipo.*,1 from lipo, holdkey where lipo.Messdatum = holdkey.messdatum group by messdatum;
 INSERT into lipo_mit_dz  SELECT *, count(*) as DublettenZahl from lipo group by messdatum ORDER BY Dublettenzahl desc ;
 DELETE from lipo_mit_dz where Dublettenzahl > 1;
 INSERT into lipo_mit_dz SELECT * from holddups ; 
like image 23
Walter Schrabmair Avatar answered Sep 30 '22 00:09

Walter Schrabmair