Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding shortest path up to ten degrees of separation

I have the following three tables in SQL:

select * from movie limit 2;

  id  |           title            | year | content_rating | duration |    lang    |       country        |  gross   |  budget  | director_id 
------+----------------------------+------+----------------+----------+------------+----------------------+----------+----------+-------------
  407 | 102 Dalmatians             | 2000 | G              |      100 | English    | USA                  | 66941559 | 85000000 |        2174
 3699 | 10 Cloverfield Lane        | 2016 | PG-13          |      104 | English    | USA                  | 71897215 | 15000000 |        1327
(2 rows)

select * from actor limit 3;

  id  |         name         | facebook_likes 
------+----------------------+----------------
  408 | Christian Bale       |          23000
 1430 | Donna Murphy         |            553
   66 | Robert Downey Jr.    |          21000
(3 rows)

select * from acting limit 3;

 movie_id | actor_id 
----------+----------
      407 |     2024
     3699 |     1841
     3016 |       11
(3 rows)

Given two actors a1 and a2, I want to find the shortest path between a1 and a2.

For example, let's say a1 = 'Tom Cruise' and a2 = 'Robert Downey Jr'.

The output should be

Tom Cruise was in Days of Thunder with Robert Duvall -> Robert Duvall was in Lucky You with Robert Downey Jr.

In this case, Tom Cruise was 2 degrees away from Robert Downey Jr, with Robert Durvall connecting them. At most, I'd like to output up to 10 degrees, and after that ignore any connections.

I tried implementing the solution SQL query 6 degrees of separation for network analysis using recursive CTE but I don't think I've applied it properly. Help is appreciated, thanks in advance :)

Attempted query:

with recursive cte as (
select actor.name, movie.title, 1 as level from movie
left join acting on acting.movie_id = movie.id 
left join actor on actor.id = acting.actor_id
where actor.name = 'Tom Cruise'
union  
select actor.name, movie.title, level+1 from movie
left join acting on acting.movie_id = movie.id 
left join actor on actor.id = acting.actor_id
inner join cte on cte.name = actor.name
where cte.name = actor.name and cte.level < 10
)
select * from cte
like image 581
novice Avatar asked Apr 16 '19 22:04

novice


1 Answers

I'm not sure what your second select in the query would return but here's a way to get the degrees of separation between actors:

Let's say we have a table of actor ids, Origin. In order to get all the actors that have played in the same movie as one of the actors in our table, we need to start with Origin, join with Acting and then Movie in order to get all the movies that our origin actors have played in, and then join with Acting again and the Actor table to get what we want. Notice that the Acting table appears two times. If we apply this to the recursive CTE and your question, noting that the Origin table would be Cte in your example, we get the following:

WITH RECURSIVE cte(id, distance) AS (
    SELECT actor.id, 0 
    FROM actor
    WHERE actor.name = 'Tom Cruise'

    UNION

    SELECT DISTINCT actor.id, cte.distance + 1
    FROM cte
    JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
    JOIN movie ON (acting1.movie_id = movie.id)
    JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
    JOIN actor ON (acting2.actor_id = actor.id)
    WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
)

After this, the cte table will contain tuples of the type (id, dist), meaning that there exists a path from Tom Cruise to the actor with this id and with a distance of dist.

The DISTINCT is for efficiency reasons. There will be a lot of bad pairs in our Cte table (with the second value being larger than the true distance), especially if the actor graph is dense, but the correct tuple will be in the Cte table. By correct tuple I mean a tuple (actor, distance), such that distance is the shortest path between the starting actor (Tom Cruise, for instance) and this actor.

Edit: My bad, UNION does this already, so DISTINCT isn't needed for duplicates.

In order to get that distance, we add a select with a group by clause:

WITH RECURSIVE cte(id, distance) AS (
    SELECT actor.id, 0 
    FROM actor
    WHERE actor.name = 'Tom Cruise'

    UNION

    SELECT actor.id, cte.distance + 1
    FROM cte
    JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
    JOIN movie ON (acting1.movie_id = movie.id)
    JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
    JOIN actor ON (acting2.actor_id = actor.id)
    WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
)
SELECT id, MIN(distance) AS distance
FROM cte
GROUP BY id
ORDER BY 2 ASC;

If you'd want to see the result for a given second actor, say Robert Downey Jr, then this would give you the answer regarding the degrees of separation:

WITH RECURSIVE cte(id, distance) AS (
    SELECT actor.id, 0 
    FROM actor
    WHERE actor.name = 'Tom Cruise'

    UNION

    SELECT actor.id, cte.distance + 1
    FROM cte
    JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
    JOIN movie ON (acting1.movie_id = movie.id)
    JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
    JOIN actor ON (acting2.actor_id = actor.id)
    WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
), distance_table (id, distance) AS (
    SELECT id, MIN(distance) AS distance
    FROM cte
    GROUP BY id
)
SELECT 'Tom Cruise and ' || actor.name || ' are separated by ' ||
       COALESCE(TO_CHAR(distance_table.distance, '999999'), 'more than 10') || ' degrees of separation'
FROM actor
LEFT JOIN distance_table ON (actor.id = distance_table.id)
WHERE actor.name = 'Robert Downey Jr';

Although I don't think you'd generally want to compute this kind of information directly from the database, if you wanted to have a message telling the path between the actors, like the one you provided (Tom Cruise was in Days of Thunder with Robert Duvall -> Robert Duvall was in Lucky You with Robert Downey Jr.), then something like this could return that:

WITH RECURSIVE cte(id, name, distance, message) AS (
    SELECT actor.id, actor.name, 0, ''
    FROM actor
    WHERE actor.name = 'Tom Cruise'

    UNION

    SELECT actor.id, actor.name, cte.distance + 1, 
           cte.message || '> ' || cte.name || ' was in ' ||
           movie.title || ' with ' || actor.name || ' '
    FROM cte
    JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
    JOIN movie ON (acting1.movie_id = movie.id)
    JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
    JOIN actor ON (acting2.actor_id = actor.id)
    WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
), distance_table (id, distance) AS (
    SELECT id, MIN(distance) AS distance
    FROM cte
    GROUP BY id
)
SELECT id, name, message, distance
FROM cte
WHERE (id, distance) IN (SELECT * FROM distance_table)
ORDER BY distance;
like image 172
Burcea Bogdan Madalin Avatar answered Oct 26 '22 15:10

Burcea Bogdan Madalin