I have the following table:
CREATE TABLE trajectory(
user_id int,
session_id int,
timestamp timestamp with time zone,
lat double precision,
lon double precision
);
INSERT INTO trajectory(user_id, session_id, timestamp, lat, lon) VALUES
(1, 25304,'2008-10-23 02:53:04+01', 39.984702, 116.318417),
(1, 25304, '2008-10-23 02:53:10+01', 39.984683, 116.31845),
(1, 25304, '2008-10-23 02:53:15+01', 39.984686, 116.318417),
(1, 25304, '2008-10-23 02:53:20+01', 39.984688, 116.318385),
(1, 20959,'2008-10-24 02:09:59+01', 40.008304, 116.319876),
(1, 20959,'2008-10-24 02:10:04+01', 40.008413, 116.319962),
(1, 20959,'2008-10-24 02:10:14+01', 40.007171, 116.319458),
(2, 55305, '2008-10-23 05:53:05+01', 39.984094, 116.319236),
(2, 55305, '2008-10-23 05:53:11+01', 39.984198, 116.319322),
(2, 55305, '2008-10-23 05:53:21+01', 39.984224, 116.319402),
(2, 34104, '2008-10-23 23:41:04+01', 40.013867, 116.306473),
(2, 34104, '2008-10-23 23:41:16+01', 40.013907, 116.306488);
Question:
I want to add a column trip_id to this table, based on the user_id and session_id columns, so that when the user's session id changes, I know the user is on a new trip so I add that id to the new trip column.
Required output:
user_id |session_id |timestamp | lat | lon | trip_id
--------|------------|-----------------------|--------------|-----------|-----------
1 | 25304 |2008-10-23 02:53:04+01 | 39.984702 |116.318417 | 1
1 | 25304 |2008-10-23 02:53:10+01 | 39.984683 |116.31845 | 1
1 | 25304 |2008-10-23 02:53:15+01 | 39.984686 |116.318417 | 1
1 | 25304 |2008-10-23 02:53:20+01 | 39.984688 |116.318385 | 1
1 | 20959 |2008-10-24 02:09:59+01 |40.008304 |116.319876 | 2
1 | 20959 |2008-10-24 02:10:04+01 |40.008413 |116.319962 | 2
1 | 20959 |2008-10-24 02:10:14+01 |40.007171 |116.319458 | 2
2 | 55305 |2008-10-23 05:53:05+01 |39.984094 |116.319236 | 1
2 | 55305 |2008-10-23 05:53:11+01 |39.984198 |116.319322 | 1
2 | 55305 |2008-10-23 05:53:21+01 |39.984224 |116.319402 | 1
2 | 34104 |2008-10-23 23:41:04+01 |40.013867 |116.306473 | 2
2 | 34104 |2008-10-23 23:41:16+01 |40.013907 |116.306488 | 2
How can I do this?
EDIT
Thanks for these great answers, but all the answers received are kind of retrieving table values, they do not modify the table. Plus, I added the timestamp column that I initially omitted thinking that would simplify my question.
We can approach this with a gaps-and-island technique. Assuming that you have a column to order your dataset, say ordering_id:
select t.*,
count(*)
filter(where session_id is distinct from lag_session_id)
over(partition by user_id order by ordering_id) trip_id
from (
select t.*,
lag(session_id) over(partition by user_id order by ordering_id) lag_session_id
from trajectory t
) t
The idea is to bring the "preceding" session id of the same user with lag, then to compare it with the value on the current row; we can then count how many times it changed, which defines the trip id.
| user_id | session_id | lat | lon | ordering_id | lag_session_id | trip_id |
|---|---|---|---|---|---|---|
| 1 | 25304 | 39.984702 | 116.318417 | 1 | null | 1 |
| 1 | 25304 | 39.984683 | 116.31845 | 2 | 25304 | 1 |
| 1 | 25304 | 39.984686 | 116.318417 | 3 | 25304 | 1 |
| 1 | 25304 | 39.984688 | 116.318385 | 4 | 25304 | 1 |
| 1 | 20959 | 40.008304 | 116.319876 | 5 | 25304 | 2 |
| 1 | 20959 | 40.008413 | 116.319962 | 6 | 20959 | 2 |
| 1 | 20959 | 40.007171 | 116.319458 | 7 | 20959 | 2 |
| 2 | 55305 | 39.984094 | 116.319236 | 8 | null | 1 |
| 2 | 55305 | 39.984198 | 116.319322 | 9 | 55305 | 1 |
| 2 | 55305 | 39.984224 | 116.319402 | 10 | 55305 | 1 |
| 2 | 34104 | 40.013867 | 116.306473 | 11 | 55305 | 2 |
| 2 | 34104 | 40.013907 | 116.306488 | 12 | 34104 | 2 |
fiddle
Yet another approach would involve:
DENSE RANK by ordering on it.If you don't have any ordering field, you can use "ctid" for that purpose: it's a field that stores the physical location of each row within the table. Otherwise use the timestamp field in place of "ctid".
ALTER TABLE trajectory ADD COLUMN trip_id INT;
WITH cte AS (
SELECT ctid,
user_id,
MIN(ctid) OVER(PARTITION BY user_id, session_id) AS parts
FROM trajectory
), cte2 AS (
SELECT ctid,
DENSE_RANK() OVER(PARTITION BY user_id ORDER BY parts) AS trip_id
FROM cte
)
UPDATE trajectory
SET trip_id = cte2.trip_id
FROM cte2
WHERE trajectory.ctid = cte2.ctid;
Output:
| user_id | session_id | lat | lon | trip_id |
|---|---|---|---|---|
| 1 | 25304 | 39.984702 | 116.318417 | 1 |
| 1 | 25304 | 39.984683 | 116.31845 | 1 |
| 1 | 25304 | 39.984686 | 116.318417 | 1 |
| 1 | 25304 | 39.984688 | 116.318385 | 1 |
| 1 | 20959 | 40.008304 | 116.319876 | 2 |
| 1 | 20959 | 40.008413 | 116.319962 | 2 |
| 1 | 20959 | 40.007171 | 116.319458 | 2 |
| 2 | 55305 | 39.984094 | 116.319236 | 1 |
| 2 | 55305 | 39.984198 | 116.319322 | 1 |
| 2 | 55305 | 39.984224 | 116.319402 | 1 |
| 2 | 34104 | 40.013867 | 116.306473 | 2 |
| 2 | 34104 | 40.013907 | 116.306488 | 2 |
Check the demo here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With