Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: Delete all NOT MAX Records in GroupBy

Tags:

sql

My goal is to delete all records from my table that are NOT the MAX(recordDate) of a grouped CaseKey. So if I have 9 records with 3 sets of 3 casekeys, and each casekey has its 3 dates. I'd delete the 2 lower dates of each set and come up with 3 total records, only the MAX(recordDate) of each remaining.

I have the following SQL Query:

    DELETE FROM table
    WHERE tableID NOT IN (
    SELECT tableID
    FROM (
    Select MAX(recordDate) As myDate, tableID From table
    Group By CaseKey
    ) As foo
    )

I receive the error: Error on Line 3... Column 'table.tableID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Obviously I could add tableID to my Group By clause, but then the result of that statement is incorrect and returns all rows instead of just returning the MAX recordDate of the grouped CaseKeys.

Server is down right now, but the apparent answer is: (tiny tweak from WildPlasser's answer)

DELETE zt FROM ztable zt
WHERE EXISTS (
    SELECT * FROM ztable ex
    WHERE ex.CaseKey = zt.CaseKey
    AND ex.recordDate > zt.recordDate
);

In other words, for each record in zt, run a query to see if the same record also has a record with a higher recordDate. If so, the WHERE EXISTS statement passes and the record is deleted, otherwise the WHERE statement fails and the record is its own MAX recordDate.

Thank you, WildPlasser, for that simplistic methodology that I was somehow blowing up.

like image 581
Suamere Avatar asked Jan 10 '13 19:01

Suamere


1 Answers

There is one special property of MAX: there is no record with a higher value than max. So we can delete all the records for which a record with the same CaseKey, but with a higher recordDate exists:

DELETE FROM ztable zt
WHERE EXISTS (
    SELECT *
    FROM ztable ex
    WHERE ex.CaseKey = zt.CaseKey
    AND ex.recordDate > zt.recordDate
    );

BTW: The above query (as well as the MAX() version) assumes that there is only one record with the maximum date. There could be ties.

In the case of ties, you'll need to add an extra field to the where clause; as a tie-breaker. Assuming that TableId can function as such, the query would become:

DELETE FROM ztable zt
WHERE EXISTS (
    SELECT *
    FROM ztable ex
    WHERE ex.CaseKey = zt.CaseKey
    AND (   ex.recordDate > zt.recordDate
        OR (ex.recordDate = zt.recordDate AND ex.TableId > zt.TableId)
        )
    );
like image 129
wildplasser Avatar answered Oct 04 '22 04:10

wildplasser