Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle: get countries separated by N borders

I would like to get all countries separated by N (1,2,3,4 ...) borders from a specified country.

N should also be specified.

For example I have the table "borders" and "country":


border | neighbor
-----------------
  FR   |   DE
  FR   |   IT
  IT   |   FR
  DE   |   FR
  DE   |   PL
  PL   |   DE
  DE   |   DK
  DK   |   DE


CODE COUNTRYNAME  
---- ---------
FR   France   
DE   Germany
RU   Russia   
IT   Italy
PL   Poland      
DK   Denmark
  1. N = 2 & country = France

If I want to get countries separated by 2 borders from France, it should return Poland (FR -> DE -> PL) and Denmark (FR -> DE -> DK)

  1. N = 1 & country = France

If I want to get countries separated by 1 border from France, it should return Germany (FR -> DE) and Italy (FR -> IT)

I can modify borders if needed.

I tried some hierarchical queries without success.

Thanks BR

like image 342
Hayk Avatar asked May 20 '18 12:05

Hayk


3 Answers

Here's a complete enumeration of all possible paths and their lengths, given a starting country and no limitation as to paths being the shortest possible paths between two countries (disclaimer, don't run this on too many countries):

WITH 
  countries AS (SELECT DISTINCT border country FROM t),
  chains (country, path, destination, steps) AS (
    SELECT country, country, country, 0
    FROM countries
    UNION ALL
    SELECT chains.country, chains.path || '->' || t.neighbor, t.neighbor, chains.steps + 1
    FROM chains
    JOIN t ON chains.destination = t.border 
    AND chains.path NOT LIKE '%' || t.neighbor || '%' -- This prevents cycles
  )
SELECT *
FROM chains
ORDER BY country, steps;

The result being:

| COUNTRY |           PATH | DESTINATION | STEPS |
|---------|----------------|-------------|-------|
|      DE |             DE |          DE |     0 |
|      DE |         DE->PL |          PL |     1 |
|      DE |         DE->FR |          FR |     1 |
|      DE |         DE->DK |          DK |     1 |
|      DE |     DE->FR->IT |          IT |     2 |
|      DK |             DK |          DK |     0 |
|      DK |         DK->DE |          DE |     1 |
|      DK |     DK->DE->FR |          FR |     2 |
|      DK |     DK->DE->PL |          PL |     2 |
|      DK | DK->DE->FR->IT |          IT |     3 |
|      FR |             FR |          FR |     0 |
|      FR |         FR->IT |          IT |     1 |
|      FR |         FR->DE |          DE |     1 |
|      FR |     FR->DE->DK |          DK |     2 |
|      FR |     FR->DE->PL |          PL |     2 |
|      IT |             IT |          IT |     0 |
|      IT |         IT->FR |          FR |     1 |
|      IT |     IT->FR->DE |          DE |     2 |
|      IT | IT->FR->DE->PL |          PL |     3 |
|      IT | IT->FR->DE->DK |          DK |     3 |
|      PL |             PL |          PL |     0 |
|      PL |         PL->DE |          DE |     1 |
|      PL |     PL->DE->FR |          FR |     2 |
|      PL |     PL->DE->DK |          DK |     2 |
|      PL | PL->DE->FR->IT |          IT |     3 |

SQLFiddle here.

Store the query in a view and then you can filter on it, e.g.

SELECT * FROM my_view WHERE country = 'FR' AND steps = 2

Side-note on shortest paths:

If you actually did need the shortest paths between two countries, just reuse that view again picking an arbitrary path when two paths tie (again, this isn't the most efficient solution, don't do this for too many countries!):

SELECT 
  country,
  destination,
  MIN(steps) KEEP (DENSE_RANK FIRST ORDER BY steps) AS steps,
  MIN(path)  KEEP (DENSE_RANK FIRST ORDER BY steps) AS path
FROM paths
WHERE country != destination
GROUP BY country, destination
ORDER BY country, destination

And get:

| COUNTRY | DESTINATION | STEPS |           PATH |
|---------|-------------|-------|----------------|
|      DE |          DK |     1 |         DE->DK |
|      DE |          FR |     1 |         DE->FR |
|      DE |          IT |     2 |     DE->FR->IT |
|      DE |          PL |     1 |         DE->PL |
|      DK |          DE |     1 |         DK->DE |
|      DK |          FR |     2 |     DK->DE->FR |
|      DK |          IT |     3 | DK->DE->FR->IT |
|      DK |          PL |     2 |     DK->DE->PL |
|      FR |          DE |     1 |         FR->DE |
|      FR |          DK |     2 |     FR->DE->DK |
|      FR |          IT |     1 |         FR->IT |
|      FR |          PL |     2 |     FR->DE->PL |
|      IT |          DE |     2 |     IT->FR->DE |
|      IT |          DK |     3 | IT->FR->DE->DK |
|      IT |          FR |     1 |         IT->FR |
|      IT |          PL |     3 | IT->FR->DE->PL |
|      PL |          DE |     1 |         PL->DE |
|      PL |          DK |     2 |     PL->DE->DK |
|      PL |          FR |     2 |     PL->DE->FR |
|      PL |          IT |     3 | PL->DE->FR->IT |

As can be seen in this SQL Fiddle, or again, with a bit more data.

like image 101
Lukas Eder Avatar answered Nov 14 '22 06:11

Lukas Eder


You can aggregate the neighbours of each country into a collection and then use a simple hierarchical query to find the (non-cyclic) paths:

SQL Fiddle

Oracle 11g R2 Schema Setup:

create table borders (border char(2), neighbor char(2));

insert into borders values ('FR','DE');
insert into borders values ('FR','IT');
insert into borders values ('IT','FR');
insert into borders values ('DE','FR');
insert into borders values ('DE','PL');
insert into borders values ('PL','DE');
insert into borders values ('DE','DK');
insert into borders values ('DK','DE');
insert into borders values ('RU','PL');
insert into borders values ('PL','RU');
insert into borders values ('IT','CH');
insert into borders values ('FR','CH');
insert into borders values ('DE','CH');
insert into borders values ('CH','IT');
insert into borders values ('CH','FR');
insert into borders values ('CH','DE');

CREATE TYPE countrylist AS TABLE OF CHAR(2);

Query 1:

WITH neighbors ( country, neighbors ) AS (
  SELECT border,
         CAST( COLLECT( neighbor ) AS COUNTRYLIST )
  FROM   borders
  GROUP BY border
)
SELECT SYS_CONNECT_BY_PATH( country, '->' ) AS path,
       CONNECT_BY_ROOT( country ) AS origin,
       country AS destination,
       LEVEL - 1 AS path_length
FROM   neighbors
WHERE  LEVEL - 1 = 4      -- Remove this to find paths of any length
START WITH country = 'FR' -- Remove this to start from any country
CONNECT BY NOCYCLE
       country MEMBER OF PRIOR neighbors

Results:

|                 PATH | ORIGIN | DESTINATION | PATH_LENGTH |
|----------------------|--------|-------------|-------------|
| ->FR->CH->DE->PL->RU |     FR |          RU |           4 |
| ->FR->IT->CH->DE->DK |     FR |          DK |           4 |
| ->FR->IT->CH->DE->PL |     FR |          PL |           4 |

Explain Plan:

-----------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name    | Rows | Bytes | Cost | Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |         |   16 |   608 |    5 | 00:00:01 |
| * 1 |   FILTER                                   |         |      |       |      |          |
| * 2 |    CONNECT BY NO FILTERING WITH START-WITH |         |      |       |      |          |
|   3 |     VIEW                                   |         |   16 |   608 |    4 | 00:00:01 |
|   4 |      SORT GROUP BY                         |         |   16 |   128 |    4 | 00:00:01 |
|   5 |       TABLE ACCESS FULL                    | BORDERS |   16 |   128 |    3 | 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter(LEVEL-1=4)
* 2 - filter("COUNTRY"MEMBER OFPRIOR "NEIGHBORS" AND "COUNTRY"='FR')

Query 2 This will get the shortest path (with ties) between pairs of countries:

WITH neighbors ( country, neighbors ) AS (
  SELECT border,
         CAST( COLLECT( neighbor ) AS COUNTRYLIST )
  FROM   borders
  GROUP BY border
)
SELECT path,
       origin,
       destination,
       path_length
FROM   (
  SELECT SYS_CONNECT_BY_PATH( country, '->' ) AS path,
         CONNECT_BY_ROOT( country ) AS origin,
         country AS destination,
         LEVEL - 1 AS path_length,
         RANK() OVER (
           PARTITION BY CONNECT_BY_ROOT( country ), country
           ORDER BY LEVEL ASC
         ) AS path_length_rank
  FROM   neighbors
  WHERE  LEVEL > 1
  CONNECT BY NOCYCLE
         country MEMBER OF PRIOR neighbors
  ORDER BY origin, destination
)
WHERE  path_length_rank = 1

Results:

|                 PATH | ORIGIN | DESTINATION | PATH_LENGTH |
|----------------------|--------|-------------|-------------|
|             ->CH->DE |     CH |          DE |           1 |
|         ->CH->DE->DK |     CH |          DK |           2 |
|             ->CH->FR |     CH |          FR |           1 |
|             ->CH->IT |     CH |          IT |           1 |
|         ->CH->DE->PL |     CH |          PL |           2 |
|     ->CH->DE->PL->RU |     CH |          RU |           3 |
|             ->DE->CH |     DE |          CH |           1 |
|             ->DE->DK |     DE |          DK |           1 |
|             ->DE->FR |     DE |          FR |           1 |
|         ->DE->FR->IT |     DE |          IT |           2 |
|         ->DE->CH->IT |     DE |          IT |           2 |
|             ->DE->PL |     DE |          PL |           1 |
|         ->DE->PL->RU |     DE |          RU |           2 |
|         ->DK->DE->CH |     DK |          CH |           2 |
|             ->DK->DE |     DK |          DE |           1 |
|         ->DK->DE->FR |     DK |          FR |           2 |
|     ->DK->DE->FR->IT |     DK |          IT |           3 |
|     ->DK->DE->CH->IT |     DK |          IT |           3 |
|         ->DK->DE->PL |     DK |          PL |           2 |
|     ->DK->DE->PL->RU |     DK |          RU |           3 |
|             ->FR->CH |     FR |          CH |           1 |
|             ->FR->DE |     FR |          DE |           1 |
|         ->FR->DE->DK |     FR |          DK |           2 |
|             ->FR->IT |     FR |          IT |           1 |
|         ->FR->DE->PL |     FR |          PL |           2 |
|     ->FR->DE->PL->RU |     FR |          RU |           3 |
|             ->IT->CH |     IT |          CH |           1 |
|         ->IT->FR->DE |     IT |          DE |           2 |
|         ->IT->CH->DE |     IT |          DE |           2 |
|     ->IT->CH->DE->DK |     IT |          DK |           3 |
|     ->IT->FR->DE->DK |     IT |          DK |           3 |
|             ->IT->FR |     IT |          FR |           1 |
|     ->IT->CH->DE->PL |     IT |          PL |           3 |
|     ->IT->FR->DE->PL |     IT |          PL |           3 |
| ->IT->FR->DE->PL->RU |     IT |          RU |           4 |
| ->IT->CH->DE->PL->RU |     IT |          RU |           4 |
|         ->PL->DE->CH |     PL |          CH |           2 |
|             ->PL->DE |     PL |          DE |           1 |
|         ->PL->DE->DK |     PL |          DK |           2 |
|         ->PL->DE->FR |     PL |          FR |           2 |
|     ->PL->DE->CH->IT |     PL |          IT |           3 |
|     ->PL->DE->FR->IT |     PL |          IT |           3 |
|             ->PL->RU |     PL |          RU |           1 |
|     ->RU->PL->DE->CH |     RU |          CH |           3 |
|         ->RU->PL->DE |     RU |          DE |           2 |
|     ->RU->PL->DE->DK |     RU |          DK |           3 |
|     ->RU->PL->DE->FR |     RU |          FR |           3 |
| ->RU->PL->DE->FR->IT |     RU |          IT |           4 |
| ->RU->PL->DE->CH->IT |     RU |          IT |           4 |
|             ->RU->PL |     RU |          PL |           1 |

Explain Plan:

---------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows | Bytes | Cost | Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         |   16 | 32576 |    6 | 00:00:01 |
| * 1 |   VIEW                             |         |   16 | 32576 |    6 | 00:00:01 |
|   2 |    SORT ORDER BY                   |         |   16 |   608 |    6 | 00:00:01 |
| * 3 |     WINDOW SORT PUSHED RANK        |         |   16 |   608 |    6 | 00:00:01 |
| * 4 |      FILTER                        |         |      |       |      |          |
| * 5 |       CONNECT BY WITHOUT FILTERING |         |      |       |      |          |
|   6 |        VIEW                        |         |   16 |   608 |    4 | 00:00:01 |
|   7 |         SORT GROUP BY              |         |   16 |   128 |    4 | 00:00:01 |
|   8 |          TABLE ACCESS FULL         | BORDERS |   16 |   128 |    3 | 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("PATH_LENGTH_RANK"=1)
* 3 - filter(RANK() OVER ( PARTITION BY ANY,"COUNTRY" ORDER BY LEVEL)<=1)
* 4 - filter(LEVEL>1)
* 5 - filter("COUNTRY"MEMBER OFPRIOR "NEIGHBORS")
like image 4
MT0 Avatar answered Nov 14 '22 06:11

MT0


Your BORDERS table contains reciprocal relationships (e.g. FR->DE, DE->FR) . This means you need to handle cycles. This is not straightforward at all, because you want to avoid say FR->DE->PL->DE at three degrees of separation.

I have here a recursive WITH clause, (so Oracle 11gR2 or later) which does this.

with hqry (path, nghbr, prev_bdr, root_bdr, lvl) as (
    select b.border
            , b.neighbor
            , b.border
            , b.border
            , 1 as lvl
    from borders b
    where b.border = 'FR'
    union all
    select hqry.path || '->' || b.border
           , b.neighbor
           , hqry.nghbr
           , hqry.root_bdr
           , hqry.lvl + 1
    from hqry
         join borders b on b.border = hqry.nghbr
    where b.neighbor != hqry.root_bdr
    and  b.neighbor != hqry.prev_bdr
    and hqry.lvl < 3  -- this is a nasty kludge, with more time I'd like to fix it
)
SEARCH DEPTH FIRST BY path SET order1
CYCLE path SET cycle TO 1 DEFAULT 0
select path || '->' ||  nghbr as end_path
from hqry
where hqry.lvl = 3
;

It takes five parameters

  • path - the previous chain of borders
  • nghbr - the current neighbor, stating with the root country i.e. France
  • prev_bdr - the immediate previous border to prevent FR->DE->PL->DE
  • root_bdr - the originating border to prevent FR->CH->IT->FR
  • lvl - to track the degrees of separation

Sample output for three degrees of separation:

FR->CH->DE->DK
FR->CH->DE->PL
FR->DE->CH->IT
FR->DE->PL->RU
FR->IT->CH->DE

There is a SQL Fiddle demo here. I added in a couple more countries: Switzerland adds some nasty edges cases.

Obviously the above output shows it doesn't enforce the shortest path algorithm, that is left as an exercise for the reader :) If you're interested in adding this - and you should be, because I think it's key to a robust solution - I suggest you real this post by Lucas Jellema.

like image 3
APC Avatar answered Nov 14 '22 04:11

APC