Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I delete duplicate rows in sqlite where every column including the ID is dupplicate? [duplicate]

Tags:

sql

sqlite

I am using sqlite. I am importing a dataset where the ID is assigned externally into a temporary table before I insert them into my permanent table. Permanent table uses the external id (RunId) and has no other id column.

I import the table from csv into a new table Book1 where C15 is the ID column. Then the I run the insert:

INSERT INTO PrimusRuns (RunId, TransientName, RunDateStart, RunType, TestDateStart,
                    Gross, CPS, Shares, MaxExposure, PercentWin, Duration)
SELECT a.C15, a.C1, JULIANDAY(a.C2), a.C3,JULIANDAY(a.C4),
        a.C6, a.C8, a.C9, a.C10, a.C11, a.C14
FROM Book1 as a;

however I get a primary key constraint error:

[19] [SQLITE_CONSTRAINT_PRIMARYKEY] A PRIMARY KEY constraint failed (UNIQUE constraint failed: PrimusRuns.RunID)

First I thought some of these rows are already in the table however running:

SELECT * FROM Book1 WHERE C15 IN(
  SELECT RunID from PrimusRuns
);

returns nothing.

Then I realized there are duplicate rows in the import when I ran:

SELECT * FROM Book1 GROUP BY C15 HAVING COUNT(*) > 1

This aggregate query return 95 rows meaning there's at least 95 rows I must delete. How to I delete tell it to delete the duplicates?

NOTE : There are other questions similar to this, however my question is different in that the id is also a duplicate. Other questions group all the rest of the column and delete the max(id). But in my case max id returns both rows not just one.

like image 514
Behrooz Karjoo Avatar asked Nov 08 '22 10:11

Behrooz Karjoo


1 Answers

For what is related only at the deletion of the duplicated row reffering to C15 you could find all the min(id) group by C15 .. so this are the single row for each C15 value and the delete the others eg

  delete from 
  book1
  where id not in ( 
  select min(id)
  from Book1
  group by C15) 

but if you have a full exactly duplicated rows you could do useìin these steps ..

1 ) You can create a temporary table with the distinct result for duplicated eg:

create table my_temp_distinct  as  
select col1, col2 ... 
from Book1
group by col1, col2, ... 
having count(*)> 1

2 ) then delete all the rows with duplicated rows

  delete from 
  book1
  where id in ( 
  select min(id)
  from Book1
  group by C15) 

3 ) and last insert using select form the parked rows

insert into Book1 (col1, col2 ....)
select col1, col2, ...
from my_temp_distinct
like image 193
ScaisEdge Avatar answered Nov 15 '22 07:11

ScaisEdge