I'm trying to convert the following SQL statement to Core Data:
delete from SomeTable
where someID not in (
select someID
from SomeTable
group by property1, property2, property3
)
Basically, I want to retrieve and delete possible duplicates in a table where a record is deemed a duplicate if property1, property2 and property3 are equal to another record.
How can I do that?
PS: As the title says, I'm trying to convert the above SQL statement into iOS Core Data methods, not trying to improve, correct or comment on the above SQL, that is beyond the point.
Thank you.
It sounds like you are asking for SQL to accomplish your objective. Your starting query won't do what you describe, and most databases wouldn't accept it at all on account of the aggregate subquery attempting to select a column that is not a function of the groups.
UPDATE
I had initially thought the request was to delete all members of each group containing dupes, and wrote code accordingly. Having reinterpreted the original SQL as MySQL would do, it seems the objective is to retain exactly one element for each combination of (property1, property2, property3)
. I guess that makes more sense anyway. Here is a standard way to do that:
delete from SomeTable st1
where someID not in (
select min(st2.someId)
from SomeTable st2
group by property1, property2, property3
)
That's distinguished from the original by use of the min()
aggregate function to choose a specific one of the someId
values to retain from each group. This should work, too:
delete from SomeTable st1
where someID in (
select st3.someId
from SomeTable st2
join SomeTable st3
on st2.property1 = st3.property1
and st2.property2 = st3.property2
and st2.property3 = st3.property3
where st2.someId < st3.someId
)
These two queries will retain the same rows. I like the second better, even though it's longer, because the NOT IN
operator is kinda nasty for choosing a small number of elements from a large set. If you anticipate having enough rows to be concerned about scaling, though, then you should try both, and perhaps look into optimizations (for example, an index on (property1, property2, property3)
) and other alternatives.
As for writing it in terms of Core Data calls, however, I don't think you exactly can. Core Data does support grouping, so you could write Core Data calls that perform the subquery in the first alternative and return you the entity objects or their IDs, grouped as described. You could then iterate over the groups, skip the first element of each, and call Core Data deletion methods for all the rest. The details are out of scope for the SO format.
I have to say, though, that doing such a job in Core Data is going to be far more costly than doing it directly in the database, both in time and in required memory. Doing it directly in the database is not friendly to an ORM framework such as Core Data, however. This sort of thing is one of the tradeoffs you've chosen by going with an ORM framework.
I'd recommend that you try to avoid the need to do this at all. Define a unique index on SomeTable(property1, property2, property3)
and do whatever you need to do to avoid trying to creating duplicates or to gracefully recover from a (failed) attempt to do so.
DELETE SomeTable
FROM SomeTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, property1, property2, property3
FROM SomeTable
GROUP BY property1, property2, property3
) as KeepRows ON
SomeTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With