Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert SQL statement "delete from TABLE where someID not in (select someID from Table group by property1, property2)

Tags:

sql

ios

core-data

I'm trying to convert the following SQL statement to Core Data:

delete from SomeTable
where someID not in (
    select someID
    from SomeTable
    group by property1, property2, property3
)

Basically, I want to retrieve and delete possible duplicates in a table where a record is deemed a duplicate if property1, property2 and property3 are equal to another record.

How can I do that?

PS: As the title says, I'm trying to convert the above SQL statement into iOS Core Data methods, not trying to improve, correct or comment on the above SQL, that is beyond the point.

Thank you.

like image 677
nemesys Avatar asked Dec 14 '22 13:12

nemesys


2 Answers

It sounds like you are asking for SQL to accomplish your objective. Your starting query won't do what you describe, and most databases wouldn't accept it at all on account of the aggregate subquery attempting to select a column that is not a function of the groups.

UPDATE

I had initially thought the request was to delete all members of each group containing dupes, and wrote code accordingly. Having reinterpreted the original SQL as MySQL would do, it seems the objective is to retain exactly one element for each combination of (property1, property2, property3). I guess that makes more sense anyway. Here is a standard way to do that:

delete from SomeTable st1
where someID not in (
    select min(st2.someId)
    from SomeTable st2
    group by property1, property2, property3
  )

That's distinguished from the original by use of the min() aggregate function to choose a specific one of the someId values to retain from each group. This should work, too:

delete from SomeTable st1
where someID in (
  select st3.someId
  from SomeTable st2
    join SomeTable st3
      on st2.property1 = st3.property1
        and st2.property2 = st3.property2
        and st2.property3 = st3.property3
  where st2.someId < st3.someId
)

These two queries will retain the same rows. I like the second better, even though it's longer, because the NOT IN operator is kinda nasty for choosing a small number of elements from a large set. If you anticipate having enough rows to be concerned about scaling, though, then you should try both, and perhaps look into optimizations (for example, an index on (property1, property2, property3)) and other alternatives.

As for writing it in terms of Core Data calls, however, I don't think you exactly can. Core Data does support grouping, so you could write Core Data calls that perform the subquery in the first alternative and return you the entity objects or their IDs, grouped as described. You could then iterate over the groups, skip the first element of each, and call Core Data deletion methods for all the rest. The details are out of scope for the SO format.

I have to say, though, that doing such a job in Core Data is going to be far more costly than doing it directly in the database, both in time and in required memory. Doing it directly in the database is not friendly to an ORM framework such as Core Data, however. This sort of thing is one of the tradeoffs you've chosen by going with an ORM framework.

I'd recommend that you try to avoid the need to do this at all. Define a unique index on SomeTable(property1, property2, property3) and do whatever you need to do to avoid trying to creating duplicates or to gracefully recover from a (failed) attempt to do so.

like image 82
John Bollinger Avatar answered Apr 27 '23 18:04

John Bollinger


DELETE SomeTable 
FROM SomeTable
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, property1, property2, property3 
   FROM SomeTable 
   GROUP BY property1, property2, property3
) as KeepRows ON
   SomeTable.RowId = KeepRows.RowId
WHERE
   KeepRows.RowId IS NULL
like image 41
Deep Kalra Avatar answered Apr 27 '23 17:04

Deep Kalra