I have a SQLite DB where the statement:
SELECT messdatum, count(*) as anzahl
from lipo
GROUP BY Messdatum
ORDER BY anzahl desc;
results in some lines, which indicates that I have some duplicates with the same Messdatum
.
How can I delete the duplicates only form my sqlite db? (it should delete anzahl-1 records where the messdatum is the same?) Has anyone an advice?
PS: I found this link How to remove duplicate from Microsoft but have problems with sqlite dialect. I got some errors due to the sqlite syntax. So f.e. I could do:
INSERT into holdkey SELECT messdatum, count(*) as anzahl from lipo group by messdatum having count(*) > 1;
INSERT into holddups SELECT DISTINCT lipo.* from lipo, holdkey where lipo.Messdatum = holdkey.messdatum ;
DELETE lipo from lipo, holdkey where lipo.messdatum = holdkey.messdatum;
here is an error at the delete command. How can I do that? I tried to update the holdkey.anzahl to an additional col in lipo with
UPDATE lipo,holdkey set lipo.duplettenzahl = holdkey.anzahl WHERE lipo.messdatum = holdkey.messdatum ;
but this is also not possible. If I would have the anzahl as dublettenzahl in lipo I could delete all records from lipo where dublettenzahl > 0. Please help! Thanks
Pandas duplicated() and drop_duplicates() are two quick and convenient methods to find and remove duplicates. It is important to know them as we often need to use them during the data preprocessing and analysis.
SQLite has a special column, ROWID
created on every table by default (you can switch it off using the WITHOUT ROWID
modifier, but be very sure before doing so).
This means that we can identify specific rows within sets of duplicates, for example, finding the first entry for a value:
SELECT messdatum, MIN(ROWID) FROM lipo
So one way to remove duplicates might be this:
DELETE FROM lipo
WHERE rowid NOT IN (
SELECT MIN(rowid)
FROM lipo
GROUP BY messdatum
)
I got the solution:
INSERT into holdkey SELECT messdatum, count(*) as anzahl,NameISO from lipo group by messdatum having count(*) > 1;
INSERT into holddups SELECT DISTINCT lipo.*,1 from lipo, holdkey where lipo.Messdatum = holdkey.messdatum group by messdatum;
INSERT into lipo_mit_dz SELECT *, count(*) as DublettenZahl from lipo group by messdatum ORDER BY Dublettenzahl desc ;
DELETE from lipo_mit_dz where Dublettenzahl > 1;
INSERT into lipo_mit_dz SELECT * from holddups ;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With