I have the following three tables in SQL:
select * from movie limit 2;
id | title | year | content_rating | duration | lang | country | gross | budget | director_id
------+----------------------------+------+----------------+----------+------------+----------------------+----------+----------+-------------
407 | 102 Dalmatians | 2000 | G | 100 | English | USA | 66941559 | 85000000 | 2174
3699 | 10 Cloverfield Lane | 2016 | PG-13 | 104 | English | USA | 71897215 | 15000000 | 1327
(2 rows)
select * from actor limit 3;
id | name | facebook_likes
------+----------------------+----------------
408 | Christian Bale | 23000
1430 | Donna Murphy | 553
66 | Robert Downey Jr. | 21000
(3 rows)
select * from acting limit 3;
movie_id | actor_id
----------+----------
407 | 2024
3699 | 1841
3016 | 11
(3 rows)
Given two actors a1
and a2
, I want to find the shortest path between a1
and a2
.
For example, let's say a1 = 'Tom Cruise'
and a2 = 'Robert Downey Jr'
.
The output should be
Tom Cruise was in Days of Thunder with Robert Duvall
-> Robert Duvall was in Lucky You with Robert Downey Jr.
In this case, Tom Cruise
was 2 degrees away from Robert Downey Jr
, with Robert Durvall
connecting them. At most, I'd like to output up to 10 degrees, and after that ignore any connections.
I tried implementing the solution SQL query 6 degrees of separation for network analysis using recursive CTE but I don't think I've applied it properly. Help is appreciated, thanks in advance :)
Attempted query:
with recursive cte as (
select actor.name, movie.title, 1 as level from movie
left join acting on acting.movie_id = movie.id
left join actor on actor.id = acting.actor_id
where actor.name = 'Tom Cruise'
union
select actor.name, movie.title, level+1 from movie
left join acting on acting.movie_id = movie.id
left join actor on actor.id = acting.actor_id
inner join cte on cte.name = actor.name
where cte.name = actor.name and cte.level < 10
)
select * from cte
I'm not sure what your second select in the query would return but here's a way to get the degrees of separation between actors:
Let's say we have a table of actor ids, Origin. In order to get all the actors that have played in the same movie as one of the actors in our table, we need to start with Origin, join with Acting and then Movie in order to get all the movies that our origin actors have played in, and then join with Acting again and the Actor table to get what we want. Notice that the Acting table appears two times. If we apply this to the recursive CTE and your question, noting that the Origin table would be Cte in your example, we get the following:
WITH RECURSIVE cte(id, distance) AS (
SELECT actor.id, 0
FROM actor
WHERE actor.name = 'Tom Cruise'
UNION
SELECT DISTINCT actor.id, cte.distance + 1
FROM cte
JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
JOIN movie ON (acting1.movie_id = movie.id)
JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
JOIN actor ON (acting2.actor_id = actor.id)
WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
)
After this, the cte table will contain tuples of the type (id, dist), meaning that there exists a path from Tom Cruise to the actor with this id and with a distance of dist.
The DISTINCT is for efficiency reasons. There will be a lot of bad pairs in our Cte table (with the second value being larger than the true distance), especially if the actor graph is dense, but the correct tuple will be in the Cte table. By correct tuple I mean a tuple (actor, distance), such that distance is the shortest path between the starting actor (Tom Cruise, for instance) and this actor.
Edit: My bad, UNION does this already, so DISTINCT isn't needed for duplicates.
In order to get that distance, we add a select with a group by clause:
WITH RECURSIVE cte(id, distance) AS (
SELECT actor.id, 0
FROM actor
WHERE actor.name = 'Tom Cruise'
UNION
SELECT actor.id, cte.distance + 1
FROM cte
JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
JOIN movie ON (acting1.movie_id = movie.id)
JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
JOIN actor ON (acting2.actor_id = actor.id)
WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
)
SELECT id, MIN(distance) AS distance
FROM cte
GROUP BY id
ORDER BY 2 ASC;
If you'd want to see the result for a given second actor, say Robert Downey Jr, then this would give you the answer regarding the degrees of separation:
WITH RECURSIVE cte(id, distance) AS (
SELECT actor.id, 0
FROM actor
WHERE actor.name = 'Tom Cruise'
UNION
SELECT actor.id, cte.distance + 1
FROM cte
JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
JOIN movie ON (acting1.movie_id = movie.id)
JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
JOIN actor ON (acting2.actor_id = actor.id)
WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
), distance_table (id, distance) AS (
SELECT id, MIN(distance) AS distance
FROM cte
GROUP BY id
)
SELECT 'Tom Cruise and ' || actor.name || ' are separated by ' ||
COALESCE(TO_CHAR(distance_table.distance, '999999'), 'more than 10') || ' degrees of separation'
FROM actor
LEFT JOIN distance_table ON (actor.id = distance_table.id)
WHERE actor.name = 'Robert Downey Jr';
Although I don't think you'd generally want to compute this kind of information directly from the database, if you wanted to have a message telling the path between the actors, like the one you provided (Tom Cruise was in Days of Thunder with Robert Duvall -> Robert Duvall was in Lucky You with Robert Downey Jr.), then something like this could return that:
WITH RECURSIVE cte(id, name, distance, message) AS (
SELECT actor.id, actor.name, 0, ''
FROM actor
WHERE actor.name = 'Tom Cruise'
UNION
SELECT actor.id, actor.name, cte.distance + 1,
cte.message || '> ' || cte.name || ' was in ' ||
movie.title || ' with ' || actor.name || ' '
FROM cte
JOIN acting AS acting1 ON (cte.id = acting1.actor_id)
JOIN movie ON (acting1.movie_id = movie.id)
JOIN acting AS acting2 ON (movie.id = acting2.movie_id)
JOIN actor ON (acting2.actor_id = actor.id)
WHERE cte.id <> actor.id AND cte.distance + 1 <= 10
), distance_table (id, distance) AS (
SELECT id, MIN(distance) AS distance
FROM cte
GROUP BY id
)
SELECT id, name, message, distance
FROM cte
WHERE (id, distance) IN (SELECT * FROM distance_table)
ORDER BY distance;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With