Delete all records except the most recent one?

Tags:

I have two DB tables in a one-to-many relationship. The data looks like this:

select * from student, application

Resultset:

+-----------+---------------+---------------------+ | StudentID | ApplicationID | ApplicationDateTime | +-----------+---------------+---------------------+ | 1         | 20001         | 12 April 2011       | | 1         | 20002         | 15 May 2011         | | 2         | 20003         | 02 Feb 2011         | | 2         | 20004         | 13 March 2011       | | 2         | 20005         | 05 June 2011        | +-----------+---------------+---------------------+

I want to delete all applications except for the most recent one. In other words, each student must only have one application linked to it. Using the above example, the data should look like this:

+-----------+---------------+---------------------+ | StudentID | ApplicationID | ApplicationDateTime | +-----------+---------------+---------------------+ | 1         | 20002         | 15 May 2011         | | 2         | 20005         | 05 June 2011        | +-----------+---------------+---------------------+

How would I go about constructing my DELETE statement to filter out the correct records?

340

asked Aug 30 '11 05:08

sim

2 Answers

DELETE FROM student WHERE ApplicationDateTime <> (SELECT max(ApplicationDateTime)                                FROM student s2                               WHERE s2.StudentID  = student.StudentID)

Given the long discussion in the comments, please note the following:

The above statement will work on any database that properly implements statement level read consistency regardless of any changes to the table while the statement is running.

Databases where I definitely know that this works correctly even with concurrent modifications to the table: Oracle (the one which this question is about), Postgres, SAP HANA, Firebird (and most probably MySQL using InnoDB). Because they all guarantee a consistent view of the data at the point in time when the statement started. Changing the <> to < will not change anything for them (including Oracle which this question is about)

For the above mentioned databases, the statement is not subject to the isolation level because phantom reads or non-repeatable reads can only happen between multiple statements - not within a single statement.

For database that do not implement MVCC properly and rely on locking to manage concurrency (thus blocking concurrent write access) this might actually yield wrong results if the table is updated concurrently. For those the workaround using < is probably needed.

138

answered Oct 01 '22 10:10

a_horse_with_no_name

You can use row_number() (or rank() or dense_rank(), or even just the rownum pseudocolumn) to apply an order to the records, and then use that order to decide which to discard. In this case, ordering by applicationdatetime desc gives the application with the most recent date for each student the rank of 1:

select studentid, applicationid from (     select studentid, applicationid,         row_number() over (partition by studentid             order by applicationdatetime desc) as rn     from application ) where rn = 1;   STUDENTID APPLICATIONID ---------- -------------          1         20002          2         20005

You can then delete anything with a rank higher than 1, which will preseve the records you care about:

delete from application where (studentid, applicationid) in (     select studentid, applicationid from (         select studentid, applicationid,             row_number() over (partition by studentid                 order by applicationdatetime desc) as rn         from application     )     where rn > 1 );  3 rows deleted.

answered Oct 01 '22 11:10

Alex Poole

Related questions
                            
                                Custom Devise 401 unauthorized response
                            
                                JUnit4 run all tests in a specific package using a testsuite
                            
                                How does clojure class reloading work?
                            
                                Python (CherryPy) web app deployed locally, but not visible over intranet
                            
                                How pure and lazy can Scala be?
                            
                                Robots.txt Allow sub folder but not the parent
                            
                                How can I pipe stdin from a file to the executable in Xcode 4+?
                            
                                How to repeat elements of an array along two axes?
                            
                                Creating a copy constructor for a linked list
                            
                                Ternary operator for NumPy ndarray?
                            
                                Algorithm for generating a triangular mesh from a cloud of points
                            
                                Is URL percent-encoding case sensitive?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With