I want to solve a problem where I wish to find src to dest flight path including connecting flights and if possible sort it for time required.
Can I use something like listagg to aggregate intermediate flights as a string somehow.
We could cap the number of connecting flights to a number and the time taken.
I have this as a start right now which gives me connecting flights
with cap as (select 30 time_cap , 5 connect_cap),
connecting as
(select f1.src src1
, f1.dest dest1
, f1.stt st1
, f1.endt end1
, f2.src src2
, f2.dest dest2
, f2.stt st2
, f2.endt end2
, (f2.endt - f1.stt) as time
from flight f1
inner join flight f2 on f1.dest = f2.src
where f1.endt < f2.stt)
My table looks like this now
\d+ flight
Table "public.flight"
Column | Type | Modifiers | Storage | Stats target | Description
--------+-----------------------------+-----------+----------+--------------+-------------
src | character varying(20) | | extended | |
dest | character varying(20) | | extended | |
stt | timestamp without time zone | | plain | |
endt | timestamp without time zone | | plain | |
This was an interview question which already got over.
Could a graph bfs kind of solution be solved in an sql query?
Even a non working query (pseudo code - which would work if tried) or approach would do.
In the below query, I want to figure out a way in string_agg where I can check if the last destination is the destination I want to go to.
with flight as (select f1.src||'-'||f1.dest||','||f2.src||'-'||f2.dest route
, f1.src src1
, f1.dest dest1
, f1.stt st1
, f1.endt end1
, f2.src src2
, f2.dest dest2
, f2.stt st2
, f2.endt end2
, (f2.endt - f1.stt) as time
from flight f1
inner join flight f2 on f1.dest = f2.src
where f1.endt < f2.stt)
select string_agg(route,',') from flight ;
The output from query flight
route | src1 | dest1 | st1 | end1 | src2 | dest2 | st2 | end2 | time
---------+------+-------+---------------------+---------------------+------+-------+---------------------+---------------------+----------
a-b,b-c | a | b | 2017-05-17 09:31:56 | 2017-05-17 14:31:56 | b | c | 2017-05-17 15:31:56 | 2017-05-17 16:31:56 | 07:00:00
These types of tree-traversal problems in SQL can be solved using the special syntax WITH RECURSIVE
.
To test the solution, lets create the following table with dummy data:
CREATE TABLE flight (
src TEXT
, dest TEXT
, stt TIMESTAMP
, endt TIMESTAMP);
INSERT INTO flight VALUES
('SIN', 'DAC', '2016-12-31 22:45:00', '2017-01-01 01:45:00'),
('DAC', 'SIN', '2017-01-01 16:30:00', '2017-01-01 21:30:00'),
('SIN', 'DAC', '2017-01-01 22:45:00', '2017-01-02 01:45:00'),
('DAC', 'DEL', '2017-01-01 10:00:00', '2017-01-01 11:30:00'),
('DEL', 'DAC', '2017-01-01 12:30:00', '2017-01-01 14:00:00'),
('DAC', 'CCU', '2017-01-01 10:30:00', '2017-01-01 11:15:00'),
('CCU', 'DAC', '2017-01-01 11:45:00', '2017-01-01 12:30:00'),
('SIN', 'DEL', '2017-01-01 11:00:00', '2017-01-01 15:00:00'),
('DEL', 'SIN', '2017-01-01 16:30:00', '2017-01-01 20:30:00'),
('CCU', 'DEL', '2017-01-01 12:00:00', '2017-01-01 12:45:00'),
('DEL', 'CCU', '2017-01-01 13:15:00', '2017-01-01 14:00:00');
Recursive CTEs are difficult to grok, so I'll build up the logic piece by piece.
In a recursive CTE there are two parts. An anchor subquery & an recursive subquery. The anchor subquery is executed first, and then the recursive subquery is executed until an empty result set is returned. The tricky part (for me at least) is constructing the join that will perform the traversal from 1 node to the next.
The anchor and recursive subqueries are merged using a UNION ALL
(or sometimes UNION
) and returned.
Since we are interested in flight routes, a simple anchor sub query to start with would be:
SELECT src, ARRAY[src] path, dest FROM flight
The above query has 3 columns for start, path & end, in the 2nd column, we convert the src
field from TEXT
to TEXT[]
. While this is not strictly needed, it will greatly simplify the remaining steps because arrays are easy to append to.
Using above anchor query, we can now define our recursive cte.
WITH RECURSIVE flight_paths (src, path, dest) AS (
SELECT
src
, ARRAY[src]
, dest
FROM flight
UNION ALL
SELECT
fp.src
, fp.path || f.src -- appends another 'flight source'
, f.dest -- updates the dest to the new dest
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest
-- this is the join that links the tree nodes
WHERE NOT f.src = ANY(fp.path)
-- this condition prevents infinite recursion by not visiting nodes that have already been visited.
) SELECT * FROM flight_paths
-- finally, we can select from the flight_paths recursive cte
The above returns 136 rows with my test data. The first 15 rows are shown below:
src path dest
SIN {SIN} DAC
DAC {DAC} SIN
SIN {SIN} DAC
DAC {DAC} DEL
DEL {DEL} DAC
DAC {DAC} CCU
CCU {CCU} DAC
SIN {SIN} DEL
DEL {DEL} SIN
CCU {CCU} DEL
DEL {DEL} CCU
DEL {DEL,CCU} DAC
DAC {DAC,CCU} DAC
DEL {DEL,CCU} DEL
DAC {DAC,CCU} DEL
DEL {DEL,CCU} DEL
DAC {DAC,CCU} DEL
This part, the setting up of the tree traversal, is the hardest part of writing a recursive cte. Now, that the traversal is set up, we can modify the above so that:
for this example, let src := SIN
& dest := CCU
WITH RECURSIVE flight_paths (src, path, dest, stt, endt) AS (
SELECT
src
, ARRAY[src]
, dest
, stt
, endt
FROM flight
UNION ALL
SELECT
fp.src
, fp.path || f.src
, f.dest
, fp.stt
, f.endt -- update endt to the new endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest
WHERE NOT f.src = ANY(fp.path)
AND NOT 'CCU' = ANY(fp.path)
-- (2) this new predicate stop iteration when the destination is reached
AND f.stt > fp.endt
-- (3) this new predicate only proceeds the iteration if the connecting flights are valid
)
SELECT *
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU'
-- (1) specify the start & end
This give 2 possible routes with the departure time of the first flight as the column stt
& the arrival time of the last flight as endt
:
src path dest stt endt
SIN {SIN,DAC} CCU 2016-12-31 22:45:00 2017-01-01 11:15:00
SIN {SIN,DAC,DEL} CCU 2016-12-31 22:45:00 2017-01-01 14:00:00
It is now possible to refine the result set quite easily. For instance, the final query could be modified to only return flights that arrive at the destination before noon as:
SELECT *
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU'
AND endt::time < '12:00:00'::time
It is also fairly easy from this point on to add columns that calculate flight time & connection time in the recursive cte and then filter for flights that fit a predicate on those two times. I hope I've given enough details for you to generate these two variations.
You can also cap the number of connections by filtering on the length on the path
array, although it would probably be more efficient to stop the iteration in the recursive cte once the max connections is reached.
One final note: to keep things simple before, I worked with the paths excluding the final destination, e.g. {SIN, DAC, DEL}
instead of the sequence of flights {SIN-DAC, DAC-DEL, DEL-CCU}
(or stop overs {DAC, DEL}
), but we can get those representations fairly easily as shown below:
WITH RECURSIVE flight_paths (src, flights, path, dest, stt, endt) AS (
SELECT
src
, ARRAY[src || '-' || dest]
, ARRAY[src]
, dest
, stt
, endt
FROM flight
UNION ALL
SELECT
fp.src
, fp.flights || (f.src || '-' || f.dest)
, fp.path || f.src
, f.dest
, fp.stt
, f.endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest
WHERE NOT f.src = ANY(fp.path)
AND NOT 'CCU' = ANY(fp.path)
AND f.stt > fp.endt
)
SELECT flights, stt, endt, path[2:] stopovers
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With