How can I create a new column in SQL (on Snowflake) where it counts the previous occurrences of an ID value for a time series. Should go something like this:
Input Data
| Date     | ID       |
| -------- | -------- |
| 1/1/2025 | a1       |
| 1/2/2025 | a2       |
| 1/3/2025 | a2       |
| 1/4/2025 | a1       |
| 1/5/2025 | a3       |
| 1/6/2025 | a1       |
Output Data
| Date     | ID       | Count    |
| -------- | -------- | -------- |
| 1/1/2025 | a1       | 0        | 
| 1/2/2025 | a2       | 0        |
| 1/3/2025 | a2       | 1        |
| 1/4/2025 | a1       | 1        |
| 1/5/2025 | a3       | 0        |
| 1/6/2025 | a1       | 2        |
Thanks!
All other answers I could find searching for this only counted if the ID val appeared in the the immediately previous row. I want to look back for the entire time series. It is a large dataset too so don't want to break the server.
Using cumulative COUNT:
SELECT *, COUNT(*) OVER(PARTITION BY ID ORDER BY Date) - 1 AS cnt
FROM tab;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With