Querying sequences of rows in SQL

Tags:

sql

Suppose I am storing events associated with users in a table as follows (with dt standing in for the timestamp of the event):

| dt | user | event |
|  1 |  1   |   A   |
|  2 |  1   |   D   |
|  3 |  1   |   B   |
|  4 |  1   |   C   |
|  5 |  1   |   B   |
|  6 |  2   |   B   |
|  7 |  2   |   B   |
|  8 |  2   |   A   |
|  9 |  2   |   A   |
| 10 |  2   |   C   |

Such that we could say:

user 1 has an event-sequence of ADBCB
user 2 has event-sequence BBAAC

The types of questions I would want to answer about these users are very easy to express as regular expresions on the event-sequences, e.g. "which users have an event-sequence matching A.*B?" or "which users have an event-sequence matching A[^C]*B[^C]*D?" etc.

What would be a good SQL technique or operator I could use to answer similar queries over this table structure?

Is there a way to efficiently/dynamically generate a table of user-to-event-sequence which could then be queried with regex?

I am currently looking at using Postgres, but I am curious to know if any of the bigger DBMS's like SQLServer or Oracle have specialized operators for this as well.

386

asked Apr 24 '11 14:04

nicolaskruchten

2 Answers

With Postgres 9.x this is actually quite easy:

select userid, 
       string_agg(event, '' order by dt) as event_sequence
from events
group by userid;

Using that result you can now apply a regular expression on the event_sequence:

select * 
from (
  select userid, 
         string_agg(event, '' order by dt) as event_sequence
  from events
  group by userid
) t
where event_sequence ~ 'A.*B'

With Postgres 8.x you need to find a replacement for the string_agg() function (just google for it, there are a lot of examples out there) and you need a sub-select to ensure the ordering of the aggregate as 8.x does support an order by in an aggregate function.

answered Sep 27 '22 20:09

a_horse_with_no_name

I'm not at a computer to write code for this answer, but here's how I would go about a RegEx-based solution in SQL Server:

Build a string from the resultset. Something like http://blog.sqlauthority.com/2009/11/25/sql-server-comma-separated-values-csv-from-table-column/ should work if you omit the comma
Run your RegEx match against the resulting string. Unfortunately, SQL Server does not provide this functionality natively, however, you can use a CLR function for this purpose as described at http://www.ideaexcursion.com/2009/08/18/sql-server-regular-expression-clr-udf/

This should ultimately provide you with the functionality in SQL Server that your original question requests, however, if you're analyzing a very large dataset, this could be quite slow and there may be better ways to accomplish what you're looking for.

answered Sep 27 '22 22:09

Taylor Gerring

Related questions
                            
                                Sql syntax: select without from clause as subquery in select (subselect)
                            
                                get rows based on expiry time
                            
                                How to make criteria with array field in Hibernate
                            
                                T-SQL Procedure, scalar variable error even after successful updation
                            
                                C# fastest way to insert data into SQL database
                            
                                Strange behavior of SQL Server - random selection
                            
                                How to pass parent variable value to child package for reference type: External Reference
                            
                                Database URI or URL?
                            
                                How to count different values into different rows in SQL efficiently?
                            
                                Update Parent and Child Table Simultaneously
                            
                                SSRS Backup files being created in VS 2017
                            
                                Aggregate functions are not allowed in a recursive query. Is there an alternative way to write this query?
                            
                                Matching a value to multiple columns (in one statement) from a table using MySQL
                            
                                I need to remove a unique constraints that I don't know the names of
                            
                                SQL Server: Permissions on table
                            
                                Scalability of Using MySQL as a Key/Value Database
                            
                                How do you measure the number of open database connections
                            
                                How to obtain NHibernate generated SQL in code at runtime?
                            
                                Appropriate Uses of the `IDENTITY` in TSQL [duplicate]
                            
                                Connecting to SQL Server from java with TCP disabled

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With