Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: Can a single OVER clause support multiple window functions?

Let's say I've got a list of pageview events, each of which has a session id. For each event I want to add the time and URL of the chronologically first pageview in that event's session. For example, lets say my events are in table test which looks like this:

uid | session_id | timestamp | url
----------------------------------------------------
u1    0            0           a.com/
u1    1            1           a.com/p1
u1    1            2           a.com/p2

I want a SQL command that produces the following:

uid | session_id | timestamp | url      | s_timestamp | s_url
---------------------------------------------------------------------
u1    0            0           a.com/     0             a.com/
u1    1            1           a.com/p1   1             a.com/p1
u1    1            2           a.com/p2   1             a.com/p1

Window functions seem to be the way to go here, but I'm pretty new to them. The following statement produces the desired table, but I'm wondering if it's suboptimal

SELECT
    uid,
    session_id,
    timestamp,
    url,
    first_value(url) OVER (PARTITION BY uid, session_id ORDER BY timestamp ASC) s_url,
    first_value(timestamp) OVER (PARTITION BY uid, session_id ORDER BY timestamp ASC) s_timestamp
FROM test

In particular, it seems wrong that I use the OVER clause twice. Is there a way to select the chronologically first URL and timestamp from a session using a single OVER clause? I'm using SPARK SQL, but I will accept any answer that is applicable to multiple major SQL systems.

like image 475
conradlee Avatar asked Sep 29 '16 13:09

conradlee


People also ask

Can you have multiple window functions in SQL?

A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.

Can you use window function in having clause?

This order of operations implies that you can only use window functions in SELECT and ORDER BY . That is, window functions are not accessible in WHERE , GROUP BY , or HAVING clauses. For this reason, you cannot use any of these functions in WHERE : ROW_NUMBER() , RANK() , DENSE_RANK() , LEAD() , LAG() , or NTILE() .

Can we use GROUP BY and window function together?

The reason why window functions are not allowed in GROUP BY is the order of operations in SQL. The clauses of a SQL query are processed in a different order than they are written in the query.


1 Answers

It's possible to use the WINDOW keyword to name a window that can then be references in the SELECT clause:

SELECT
    uid,
    session_id,
    timestamp,
    url,
    first_value(url) OVER w s_url,
    first_value(timestamp) OVER w s_timestamp
FROM test
WINDOW w AS (PARTITION BY uid, session_id ORDER BY timestamp ASC)

This works in Apache Spark SQL and HiveQL.

like image 169
conradlee Avatar answered Sep 29 '22 07:09

conradlee