Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sql query to get src to dest path for connecting flights

Tags:

sql

postgresql

I want to solve a problem where I wish to find src to dest flight path including connecting flights and if possible sort it for time required.

Can I use something like listagg to aggregate intermediate flights as a string somehow.

We could cap the number of connecting flights to a number and the time taken.

I have this as a start right now which gives me connecting flights

with  cap as (select 30 time_cap , 5 connect_cap), 
 connecting as 
    (select f1.src src1
          , f1.dest dest1
          , f1.stt st1
          , f1.endt end1
          , f2.src src2
          , f2.dest dest2
          , f2.stt st2
          , f2.endt end2
          , (f2.endt - f1.stt) as time 
     from flight f1 
     inner join flight f2 on f1.dest = f2.src 
     where f1.endt < f2.stt)

My table looks like this now

\d+ flight
                                  Table "public.flight"
 Column |            Type             | Modifiers | Storage  | Stats target | Description 
--------+-----------------------------+-----------+----------+--------------+-------------
 src    | character varying(20)       |           | extended |              | 
 dest   | character varying(20)       |           | extended |              | 
 stt    | timestamp without time zone |           | plain    |              | 
 endt   | timestamp without time zone |           | plain    |              | 

This was an interview question which already got over.

Could a graph bfs kind of solution be solved in an sql query?

Even a non working query (pseudo code - which would work if tried) or approach would do.

In the below query, I want to figure out a way in string_agg where I can check if the last destination is the destination I want to go to.

with flight as (select f1.src||'-'||f1.dest||','||f2.src||'-'||f2.dest route
                     , f1.src src1
                     , f1.dest dest1
                     , f1.stt st1
                     , f1.endt end1
                     , f2.src src2
                     , f2.dest dest2
                     , f2.stt st2
                     , f2.endt end2
                     , (f2.endt - f1.stt) as time 
                from flight f1 
                inner join flight f2 on f1.dest = f2.src 
                where f1.endt < f2.stt) 

select string_agg(route,',') from flight ;

The output from query flight

  route  | src1 | dest1 |         st1         |        end1         | src2 | dest2 |         st2         |        end2         |   time   
---------+------+-------+---------------------+---------------------+------+-------+---------------------+---------------------+----------
 a-b,b-c | a    | b     | 2017-05-17 09:31:56 | 2017-05-17 14:31:56 | b    | c     | 2017-05-17 15:31:56 | 2017-05-17 16:31:56 | 07:00:00
like image 736
Akshay Hazari Avatar asked Sep 14 '25 10:09

Akshay Hazari


1 Answers

These types of tree-traversal problems in SQL can be solved using the special syntax WITH RECURSIVE.

To test the solution, lets create the following table with dummy data:

CREATE TABLE flight (
  src TEXT
, dest TEXT
, stt TIMESTAMP
, endt TIMESTAMP);

INSERT INTO flight VALUES 
('SIN', 'DAC', '2016-12-31 22:45:00', '2017-01-01 01:45:00'),
('DAC', 'SIN', '2017-01-01 16:30:00', '2017-01-01 21:30:00'),
('SIN', 'DAC', '2017-01-01 22:45:00', '2017-01-02 01:45:00'),
('DAC', 'DEL', '2017-01-01 10:00:00', '2017-01-01 11:30:00'),
('DEL', 'DAC', '2017-01-01 12:30:00', '2017-01-01 14:00:00'),
('DAC', 'CCU', '2017-01-01 10:30:00', '2017-01-01 11:15:00'),
('CCU', 'DAC', '2017-01-01 11:45:00', '2017-01-01 12:30:00'),
('SIN', 'DEL', '2017-01-01 11:00:00', '2017-01-01 15:00:00'),
('DEL', 'SIN', '2017-01-01 16:30:00', '2017-01-01 20:30:00'),
('CCU', 'DEL', '2017-01-01 12:00:00', '2017-01-01 12:45:00'),
('DEL', 'CCU', '2017-01-01 13:15:00', '2017-01-01 14:00:00');

Recursive CTEs are difficult to grok, so I'll build up the logic piece by piece.

In a recursive CTE there are two parts. An anchor subquery & an recursive subquery. The anchor subquery is executed first, and then the recursive subquery is executed until an empty result set is returned. The tricky part (for me at least) is constructing the join that will perform the traversal from 1 node to the next.

The anchor and recursive subqueries are merged using a UNION ALL (or sometimes UNION) and returned.

Since we are interested in flight routes, a simple anchor sub query to start with would be:

SELECT src, ARRAY[src] path, dest FROM flight

The above query has 3 columns for start, path & end, in the 2nd column, we convert the src field from TEXT to TEXT[]. While this is not strictly needed, it will greatly simplify the remaining steps because arrays are easy to append to.

Using above anchor query, we can now define our recursive cte.

WITH RECURSIVE flight_paths (src, path, dest) AS (
SELECT
  src
, ARRAY[src]
, dest 
FROM flight
UNION ALL
SELECT
  fp.src
, fp.path || f.src -- appends another 'flight source'
, f.dest -- updates the dest to the new dest
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest 
-- this is the join that links the tree nodes
WHERE NOT f.src = ANY(fp.path) 
-- this condition prevents infinite recursion by not visiting nodes that have already been visited.
) SELECT * FROM flight_paths
-- finally, we can select from the flight_paths recursive cte

The above returns 136 rows with my test data. The first 15 rows are shown below:

src path        dest
SIN {SIN}       DAC
DAC {DAC}       SIN
SIN {SIN}       DAC
DAC {DAC}       DEL
DEL {DEL}       DAC
DAC {DAC}       CCU
CCU {CCU}       DAC
SIN {SIN}       DEL
DEL {DEL}       SIN
CCU {CCU}       DEL
DEL {DEL}       CCU
DEL {DEL,CCU}   DAC
DAC {DAC,CCU}   DAC
DEL {DEL,CCU}   DEL
DAC {DAC,CCU}   DEL
DEL {DEL,CCU}   DEL
DAC {DAC,CCU}   DEL

This part, the setting up of the tree traversal, is the hardest part of writing a recursive cte. Now, that the traversal is set up, we can modify the above so that:

  1. we start at the source & end at the destination
  2. stop iteration when the destination is reached
  3. only consider connecting flights A-B & B-C to be valid if arrival(A-B) < departure(B-C)

for this example, let src := SIN & dest := CCU

WITH RECURSIVE flight_paths (src, path, dest, stt, endt) AS (
SELECT
  src
, ARRAY[src]
, dest 
, stt
, endt
FROM flight
UNION ALL
SELECT
  fp.src
, fp.path || f.src
, f.dest 
, fp.stt
, f.endt -- update endt to the new endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest 
WHERE NOT f.src = ANY(fp.path) 
  AND NOT 'CCU' = ANY(fp.path) 
  -- (2) this new predicate stop iteration when the destination is reached
  AND f.stt > fp.endt
  -- (3) this new predicate only proceeds the iteration if the connecting flights are valid
) 
SELECT * 
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU'
-- (1) specify the start & end

This give 2 possible routes with the departure time of the first flight as the column stt & the arrival time of the last flight as endt:

src path            dest    stt                 endt
SIN {SIN,DAC}       CCU     2016-12-31 22:45:00 2017-01-01 11:15:00
SIN {SIN,DAC,DEL}   CCU     2016-12-31 22:45:00 2017-01-01 14:00:00

It is now possible to refine the result set quite easily. For instance, the final query could be modified to only return flights that arrive at the destination before noon as:

SELECT * 
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU' 
  AND endt::time < '12:00:00'::time

It is also fairly easy from this point on to add columns that calculate flight time & connection time in the recursive cte and then filter for flights that fit a predicate on those two times. I hope I've given enough details for you to generate these two variations.

You can also cap the number of connections by filtering on the length on the path array, although it would probably be more efficient to stop the iteration in the recursive cte once the max connections is reached.

One final note: to keep things simple before, I worked with the paths excluding the final destination, e.g. {SIN, DAC, DEL} instead of the sequence of flights {SIN-DAC, DAC-DEL, DEL-CCU} (or stop overs {DAC, DEL}), but we can get those representations fairly easily as shown below:

WITH RECURSIVE flight_paths (src, flights, path, dest, stt, endt) AS (
SELECT
  src
, ARRAY[src || '-' || dest]
, ARRAY[src]
, dest 
, stt
, endt
FROM flight
UNION ALL
SELECT
  fp.src
, fp.flights || (f.src || '-' || f.dest)
, fp.path || f.src
, f.dest
, fp.stt
, f.endt
FROM flight f
JOIN flight_paths fp ON f.src = fp.dest 
WHERE NOT f.src = ANY(fp.path) 
  AND NOT 'CCU' = ANY(fp.path) 
  AND f.stt > fp.endt
) 
SELECT flights, stt, endt, path[2:] stopovers
FROM flight_paths
WHERE src = 'SIN' AND dest = 'CCU'
like image 95
Haleemur Ali Avatar answered Sep 15 '25 23:09

Haleemur Ali