Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you identify record pattern sequences in records using TSQL?

This is a fairly new exercise for me but I need to find a way to identify pattern sequences within a table. So for example, lets say I have a simple table that resembles the following:

enter image description here

Now what I would like to do is identify and group all the records which have the sequenced pattern of values 5, 9 and 6 presenting them in a query. How would you accomplish this task using T-SQL?

The results should look like this:

enter image description here

I've looked for some potential examples of how this might be accomplished but couldn't find anything that really helps.

like image 209
Mark Avatar asked Jul 29 '15 14:07

Mark


People also ask

How do I find SQL patterns?

SQL pattern matching allows you to search for patterns in data if you don't know the exact word or phrase you are seeking. This kind of SQL query uses wildcard characters to match a pattern, rather than specifying it exactly. For example, you can use the wildcard "C%" to match any string beginning with a capital C.

How is the pattern matching done in the SQL?

SQL pattern matching enables you to use _ to match any single character and % to match an arbitrary number of characters (including zero characters). In MySQL, SQL patterns are case-insensitive by default. Some examples are shown here. Do not use = or <> when you use SQL patterns.

How do you find sequential numbers in SQL?

To number rows in a result set, you have to use an SQL window function called ROW_NUMBER() . This function assigns a sequential integer number to each result row.

Which clause is used for pattern matching in SQL?

LIKE clause is used to perform the pattern matching task in SQL. A WHERE clause is generally preceded by a LIKE clause in an SQL query.


2 Answers

You can use the following query wrapped in a CTE in order to assign sequence numbers to the values contained in your sequence:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
)

Output:

v   rn
-------
5   1
9   2
6   3

Using the above CTE you can identify islands, i.e. slices of sequential rows containing the whole of the sequence:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
), Grp AS (
SELECT [Key], [Value], 
       ROW_NUMBER() OVER (ORDER BY [Key]) - rn AS grp            
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Value = s.v
)
SELECT *
FROM Grp

Output:

    Key Value   grp
   -----------------
    1   5       0
    2   9       0
    3   6       0
    6   5       3
    7   9       3
    8   6       3

grp field helps you identify exactly these islands.

All you need to do now is to just filter out partial groups:

;WITH Seq AS (
    SELECT v, ROW_NUMBER() OVER(ORDER BY k) AS rn
    FROM (VALUES(1, 5), (2, 9), (3, 6)) x(k,v)
), Grp AS (
SELECT [Key], [Value], 
       ROW_NUMBER() OVER (ORDER BY [Key]) - rn AS grp            
FROM mytable AS m
LEFT JOIN Seq AS s ON m.Value = s.v
)
SELECT g1.[Key], g1.[Value]
FROM Grp AS g1
INNER JOIN (
   SELECT grp
   FROM Grp
   GROUP BY grp
   HAVING COUNT(*) = 3 ) AS g2
ON g1.grp = g2.grp

Demo here

Note: The initial version of this answer used an INNER JOIN to Seq. This won't work if table contains values like 5, 42, 9, 6, as 42 will be filtered out by the INNER JOIN and this sequence falsely identified as a valid one. Credit goes to @HABO for this edit.

like image 127
Giorgos Betsos Avatar answered Nov 02 '22 23:11

Giorgos Betsos


Not very optimized, but I think propper answer:

CREATE TABLE pattern (
    rowID INT IDENTITY(1,1) PRIMARY KEY, 
    rowValue INT NOT NULL
);

INSERT INTO pattern (rowValue) VALUES (5);
INSERT INTO pattern (rowValue) VALUES (9);
INSERT INTO pattern (rowValue) VALUES (6);

SELECT * FROM pattern;

SELECT Trg.* FROM Keys Trg 
INNER JOIN pattern Pt ON (Trg.fValue = Pt.rowValue) 
INNER JOIN (
    SELECT K.fKey - P.rowID AS X, COUNT(*) AS Xc FROM Keys K 
        LEFT JOIN pattern P ON (K.fValue = P.rowValue) 
    WHERE 
        (P.rowID IS NOT NULL) 
    GROUP BY K.fKey - P.rowID 
    HAVING COUNT(*) = (SELECT COUNT(*) FROM pattern)
) Z ON (Trg.fKey - Pt.rowID = Z.X);

I use a table for pattern joining it to the main table. I calculate difference between the Key and pattern Key and I show only that rows which difference is matching (and rows count for the difference matching rows inside pattern table).

like image 31
Bogdan Bogdanov Avatar answered Nov 02 '22 23:11

Bogdan Bogdanov