Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get top 1 row of each group

I have a table which I want to get the latest entry for each group. Here's the table:

DocumentStatusLogs Table

|ID| DocumentID | Status | DateCreated | | 2| 1          | S1     | 7/29/2011   | | 3| 1          | S2     | 7/30/2011   | | 6| 1          | S1     | 8/02/2011   | | 1| 2          | S1     | 7/28/2011   | | 4| 2          | S2     | 7/30/2011   | | 5| 2          | S3     | 8/01/2011   | | 6| 3          | S1     | 8/02/2011   | 

The table will be grouped by DocumentID and sorted by DateCreated in descending order. For each DocumentID, I want to get the latest status.

My preferred output:

| DocumentID | Status | DateCreated | | 1          | S1     | 8/02/2011   | | 2          | S3     | 8/01/2011   | | 3          | S1     | 8/02/2011   | 
  • Is there any aggregate function to get only the top from each group? See pseudo-code GetOnlyTheTop below:

    SELECT   DocumentID,   GetOnlyTheTop(Status),   GetOnlyTheTop(DateCreated) FROM DocumentStatusLogs GROUP BY DocumentID ORDER BY DateCreated DESC 
  • If such function doesn't exist, is there any way I can achieve the output I want?

  • Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that status also be located in the parent table?

Please see the parent table for more information:

Current Documents Table

| DocumentID | Title  | Content  | DateCreated | | 1          | TitleA | ...      | ...         | | 2          | TitleB | ...      | ...         | | 3          | TitleC | ...      | ...         | 

Should the parent table be like this so that I can easily access its status?

| DocumentID | Title  | Content  | DateCreated | CurrentStatus | | 1          | TitleA | ...      | ...         | s1            | | 2          | TitleB | ...      | ...         | s3            | | 3          | TitleC | ...      | ...         | s1            | 

UPDATE I just learned how to use "apply" which makes it easier to address such problems.

like image 828
dpp Avatar asked Jul 27 '11 08:07

dpp


People also ask

How do I get the top 1 record for each group in SQL?

To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ). You assign the row numbers within each group (i.e., year).

How do you get top and rows per group?

Selecting a top n records for each category from any table, can be done easily using row_number function which generates a sequential integer to each row within a partition of a result set.


1 Answers

;WITH cte AS (    SELECT *,          ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn    FROM DocumentStatusLogs ) SELECT * FROM cte WHERE rn = 1 

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...

As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

like image 119
gbn Avatar answered Nov 04 '22 22:11

gbn