I have this chunk of code and I want to write it as SQL. Does anyone know how would equivalent SQL code look like?
lags = range(1, 5)
df = df.assign(**{
'{}{}'.format('lag', t): df.groupby('article_id').num_views.shift(t) for t in lags
})
UPDATE:
I am looking for SQL standard dialect. Here is a dataset example (partial first 10 rows):
article_id section time num_views comments
0 abc111b A 00:00 15 0
1 abc111b A 01:00 36 0
2 abc111b A 02:00 36 0
3 bbbddd222hf A 03:00 41 0
4 bbbddd222hf B 04:00 44 0
5 nnn678www B 05:00 39 0
6 nnn678www B 06:00 38 0
7 nnn678www B 07:00 66 0
8 nnn678www C 08:00 65 0
9 nnn678www C 09:00 87 1
you can use LAG() function, belonging to SQL-99 ANSI standard "windowing functions"
:
select
article_id, section, time, num_views, comments,
lag(num_views, 1, 0) over(partition by article_id order by article_id, time) as lag1,
lag(num_views, 2, 0) over(partition by article_id order by article_id, time) as lag2,
lag(num_views, 3, 0) over(partition by article_id order by article_id, time) as lag3,
lag(num_views, 4, 0) over(partition by article_id order by article_id, time) as lag4
from tab;
Complete and working SQLFiddle example...
PS please be aware that not all RDBMS systems implement "windowing functions"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With