Suppose I have a table consisting of entries like
ID Arrival Date Arrival City Departure Date Departure City
1 Jun 27 2015 Berlin Jun 20 2015 Paris
1 Jul 1 2015 Rome Jun 29 2015 Berlin
1 Jul 30 2015 Vienna Jul 15 2015 Rome
2 Jun 28 2015 Prague Jun 23 2015 Vienna
2 Jul 1 2015 Rome Jun 29 2015 Prague
2 Jul 30 2015 Vienna Jul 15 2015 Moscow
...
and for each ID I want to join this data on itself such that observations with subsequent Departure Date
and Arrival Date
are grouped pairwise - i.e. a departure is paired with the previous arrival for each ID.
In the example above (where the observations are sorted for convenience) the 2nd row would be appended to the 1st, the 3rd to the 2nd, the 5th to the 4th and the 6th to the 5th (thus producing 4 rows with fields ID Arrival Date Arrival City Departure Date Departure City Arrival Date2 Arrival City2 Departure Date2 Departure City2
).
There could potentially be more than three departures for each ID so a general approach is required. Also please note that there can be holes in the data where Arrival City
and Departure City
does not match - e.g. the Arrival City
of the 5th row is not the Departure City
of the 6th row but they should still be merged. In fact a major goal is to get a better view of how many holes there are in the data.
The SELF JOIN in SQL, as its name implies, is used to join a table to itself. This means that each row in a table is joined to itself and every other row in that table. However, referencing the same table more than once within a single query will result in an error. To avoid this, SQL SELF JOIN aliases are used.
You can join different tables by their common columns using the JOIN keyword. It is also possible to join a table to itself, which is known as a self join.
An inner join (sometimes called a simple join) is a join of two or more tables that returns only those rows that satisfy the join condition. A self join is a join of a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition.
A self JOIN is a regular join, but the table is joined with itself – this is extremely useful for comparisons within a table. Joining a table with itself means that each row of the table is combined with itself and with every other row of the table.
A solution is to use a CTE and consider that the difference between two consecutive rows (identified by the rowno) is 1 all the time (and also consider the dates):
;WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY t.ID, t.arrivalDate),
t.ID,
t.arrivalDate,
t.arrivalCity,
t.departureDate,
t.departureCity
FROM #test t
)
SELECT *
FROM CTE c1
JOIN CTE c2
ON c1.ID = c2.ID
AND c2.departureDate > c1.arrivalDate
AND c2.rownum - c1.rownum = 1
GO
-- structure of the #test table
CREATE TABLE #test (
ID int,
arrivalDate date,
arrivalCity varchar(30),
departureDate date,
departureCity varchar(30)
)
SQL fiddle here: SQLFiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With