Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL self join pairwise

Suppose I have a table consisting of entries like

ID    Arrival Date    Arrival City    Departure Date    Departure City
1     Jun 27 2015     Berlin          Jun 20 2015       Paris
1     Jul 1 2015      Rome            Jun 29 2015       Berlin
1     Jul 30 2015     Vienna          Jul 15 2015       Rome
2     Jun 28 2015     Prague          Jun 23 2015       Vienna
2     Jul 1 2015      Rome            Jun 29 2015       Prague
2     Jul 30 2015     Vienna          Jul 15 2015       Moscow
...

and for each ID I want to join this data on itself such that observations with subsequent Departure Date and Arrival Date are grouped pairwise - i.e. a departure is paired with the previous arrival for each ID.

In the example above (where the observations are sorted for convenience) the 2nd row would be appended to the 1st, the 3rd to the 2nd, the 5th to the 4th and the 6th to the 5th (thus producing 4 rows with fields ID Arrival Date Arrival City Departure Date Departure City Arrival Date2 Arrival City2 Departure Date2 Departure City2).

There could potentially be more than three departures for each ID so a general approach is required. Also please note that there can be holes in the data where Arrival City and Departure City does not match - e.g. the Arrival City of the 5th row is not the Departure City of the 6th row but they should still be merged. In fact a major goal is to get a better view of how many holes there are in the data.

like image 803
user787267 Avatar asked Jan 15 '16 16:01

user787267


People also ask

How does self join work in SQL?

The SELF JOIN in SQL, as its name implies, is used to join a table to itself. This means that each row in a table is joined to itself and every other row in that table. However, referencing the same table more than once within a single query will result in an error. To avoid this, SQL SELF JOIN aliases are used.

Can you join a table to itself in SQL?

You can join different tables by their common columns using the JOIN keyword. It is also possible to join a table to itself, which is known as a self join.

Is self join and inner join are same?

An inner join (sometimes called a simple join) is a join of two or more tables that returns only those rows that satisfy the join condition. A self join is a join of a table to itself. This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition.

Is self join useful?

A self JOIN is a regular join, but the table is joined with itself – this is extremely useful for comparisons within a table. Joining a table with itself means that each row of the table is combined with itself and with every other row of the table.


1 Answers

A solution is to use a CTE and consider that the difference between two consecutive rows (identified by the rowno) is 1 all the time (and also consider the dates):

;WITH CTE AS (
SELECT
    rownum = ROW_NUMBER() OVER (ORDER BY t.ID, t.arrivalDate),
    t.ID,
    t.arrivalDate, 
    t.arrivalCity, 
    t.departureDate, 
    t.departureCity
FROM #test t
)
SELECT *
FROM CTE c1
JOIN CTE c2
ON c1.ID = c2.ID 
    AND c2.departureDate > c1.arrivalDate
    AND c2.rownum - c1.rownum = 1
GO

-- structure of the #test table
CREATE TABLE #test (
    ID int,
    arrivalDate date,
    arrivalCity varchar(30),
    departureDate date,
    departureCity varchar(30)
)

SQL fiddle here: SQLFiddle

like image 66
Eduard Uta Avatar answered Sep 21 '22 15:09

Eduard Uta