I have a table with sample data like this:
ID Key User
-- ---- -----
1 a test
2 ab test
3 abc test
4 abcd test
5 e test1
6 ef test1
7 efg test1
8 efgh test1
9 t test1
10 ty test1
11 tyu test1
12 tyui test1
The data consists of sequential "snapshots" of a value as it is being constructed by a user. I want to return the last row for each instance of the user building up a distinct, final Key
value. Notice how most rows' Key
contain the entire previous row's Key
plus an additional letter? I only want the rows that terminate such a sequence and are the longest value possible in each chain of Keys
that successively contain the previous Key
value.
The above sample data should return the following:
ID Key User
-- ---- -----
4 abcd test
8 efgh test1
12 tyui test1
How do I do this?
1) You can use MAX or MIN along with OVER clause and add extra condition to it. The extra condition is "ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING" which will fetch previous row value. Check this: SELECT *,MIN(JoiningDate) OVER (ORDER BY JoiningDate ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS EndDate.
METHOD 1 : Using LIMIT clause in descending orderof specified rows from specifies row. We will retrieve last 5 rows in descending order using LIMIT and ORDER BY clauses and finally make the resultant rows ascending. Since Employee table has IDs, we will perform ORDER BY ID in our query.
In SQL, to fetch data from multiple tables, the join operator is used. The join operator adds or removes rows in the virtual table that is used by SQL server to process data before the other steps of the query consume the data.
Comparison of columns in the same table is possible with the help of joins. Here we are comparing all the customers that are in the same city using the self join in SQL. Self-join is a regular join where a table is joined by itself. Similarly, a table may be joined with left join, right join, inner join, and full join.
Without answers to my questions, I had to make these assumptions:
ID
column represents chronology and always increases by one with no gaps.(Update: I made a small tweak that makes this work with "interleaved" data from different users, and added some interleaved and some tricky data to my fiddle.)
So here is my stab at a solution. See it in a SqlFiddle. Of note is that it simulates the LEAD
analytic from SQL Server 2012 without a JOIN
.
WITH Info AS (
SELECT
Grp = Row_Number() OVER (PARTITION BY UserName ORDER BY ID, Which) / 2,
*
FROM
dbo.UserEntry U
CROSS JOIN (
SELECT 1 UNION ALL SELECT 2
) X (Which)
)
SELECT
ID = Max(V.ID),
DataKey = Max(V.DataKey),
UserName = Max(V.UserName)
FROM
Info I
OUTER APPLY (SELECT I.* WHERE Which = 2) V
WHERE I.Grp > 0
GROUP BY
I.UserName,
I.Grp
HAVING
Max(I.DataKey) NOT LIKE Min(I.DataKey) + '_';
Input:
INSERT dbo.UserEntry (ID, DataKey, UserName)
VALUES
(1, 'a', 'test'),
(2, 'ab', 'test'),
(3, 'e', 'test1'),
(4, 'ef', 'test1'),
(5, 'abc', 'test'),
(6, 'abcd', 'test'),
(7, 'efg', 'test1'),
(8, 'efgh', 'test1'),
(9, 't', 'test1'),
(10, 'ty', 'test1'),
(11, 'tyu', 'test1'),
(12, 'tyui', 'test1'),
(13, 't', 'test1'),
(14, 'a', 'test'),
(15, 'a', 'test'),
(16, 'ab', 'test'),
(17, 'abc', 'test'),
(18, 'abcd', 'test'),
(19, 'to', 'test1'),
(20, 'abcde', 'test'),
(21, 'top', 'test1');
Output:
ID DataKey UserName
-- ------- --------
6 abcd test
8 efgh test1
12 tyui test1
14 a test
20 abcde test
21 top test1
Notes: I used different column names because using reserved words as column names is not best practice (it forces you to put square brackets around the names everywhere).
The technique I used will work with a single scan. It has no joins. A properly-constructed join-based query that utilizes appropriate indexes may outperform it slightly in CPU and time, but this solution will have the fewest reads for sure.
Update
While my query may be nice, the particular data structure in this problem lends itself to a very elegant solution I did not consider when I first answered. Thanks to Andriy for the basic idea, here is a dynamite and ultra-simple query (same fiddle as above).
WITH Info AS (
SELECT
Grp = Row_Number() OVER (PARTITION BY UserName ORDER BY ID) - Len(DataKey),
*
FROM
dbo.UserEntry U
)
SELECT
ID = Max(I.ID),
DataKey = Max(I.DataKey),
I.UserName
FROM
Info I
GROUP BY
I.UserName,
I.Grp;
Here's another approach:
Use the initial letter as a grouping criterion for a sequence.
Rank the rows separately for every user, sorting them by ID, and subtract the lengths of the Key values from the rankings. Use the results as another sequence grouping criterion.
Rank the rows again, partitioning them this time by user and the criteria from #1 and #2, and sorting them in the descending order of ID.
Fetch the rows with the rankings of 1.
Here's an implementation:
WITH partitioned AS (
SELECT
*,
SeqKey = LEFT([Key], 1),
SeqGrp = ROW_NUMBER() OVER (
PARTITION BY UserName
ORDER BY ID
) - LEN([Key])
FROM dbo.UserEntry
),
ranked AS (
SELECT
ID,
[Key],
UserName,
rnk = ROW_NUMBER() OVER (
PARTITION BY UserName, SeqKey, SeqGrp
ORDER BY ID DESC
)
FROM partitioned
)
SELECT
ID,
[Key],
UserName
FROM ranked
WHERE rnk = 1
;
Like with @ErikE's solution, it is assumed that the ID column defines the order of a sequence. However, the above query will still work correctly if there are gaps in ID values related to the same sequence.
You can try this query at SQL Fiddle as well. (Note: the demo uses @ErikE's DDL.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With