Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query to look for a Kevin Bacon number of 2

Tags:

Using the IMDB database, I have tables actor, casts, and movie, and I need to select actors with a Kevin Bacon number of 2. I thought this should do it, but I'm getting 0 rows returned. What is my error?

select fname, lname from actor join casts on pid=actor.id where actor.id in (     select a3.id  --actors who have a kb number of 2     from casts c3 join actor a3 on c3.pid=a3.id,     (      (select c1.mid --actors who have a kb number of 1      from (casts c1 join actor a1 on c1.pid=a1.id), (casts c2 join actor a2 on c2.pid=a2.id)      where c1.mid=c2.mid and a2.fname='Kevin' and a2.lname='Bacon')     )Level1 where c3.mid=Level1.mid ) and actor.id not in (select a4.id --and only a kb number of 2      from (casts c4 join actor a4 on c4.pid=a4.id), (casts c5 join actor a5 on c5.pid=a5.id)      where c4.mid=c5.mid and a5.fname='Kevin' and a5.lname='Bacon'); 

Here are the table schemas:

ACTOR (id, fname, lname, gender) MOVIE (id, name, year) CASTS (pid, mid, role) 

mid is a foreign key to a movie id and pid is a foreign key to actor id.

Note that restrictions on the question prohibit me from using temp tables or recursion: the query should be done with subselects.


I also tried

select count(distinct pid) from casts join actor on pid=actor.id where mid in (     select mid from casts where pid in (         select distinct pid from casts where mid in (             select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')))  and pid not in       (select distinct pid from casts where mid in (         select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')); 

which also seems like it should work, but it's not finishing.


I finally managed to get some working code:

select count(distinct pid) from casts where mid in (     select mid from casts where pid in (         select distinct pid from casts where mid in (             select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')))  and pid not in       (select distinct pid from casts where mid in (         select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')); 

The subqueries return sensible answers, at least. But it's taking forever. Each subquery took under 30 seconds, but together they're taking 6 minutes and counting. Why?


Note: This was given to me as homework. To avoid any semblence of academic misconduct on my part, I'd prefer if people didn't post complete/exact solutions, but rather pointed out general things that I'm doing wrong/make general suggestions as to how I should go about this.

like image 994
Colleen Avatar asked Feb 03 '12 22:02

Colleen


People also ask

How do I query a number in SQL?

If you'd like to number each row in a result set, SQL provides the ROW_NUMBER() function. This function is used in a SELECT clause with other columns. After the ROW_NUMBER() clause, we call the OVER() function. If you pass in any arguments to OVER , the numbering of rows will not be sorted according to any column.

How do you find the number of products in SQL?

COUNT(*) returns the number of items in a group. This includes NULL values and duplicates. COUNT(ALL expression) evaluates expression for each row in a group, and returns the number of nonnull values.

How do I find the number of employees in SQL?

You can use the COUNT function in the SELECT statement to get the number of employees, the number of employees in each department, the number of employees who hold a specific job, etc.


1 Answers

To give a sketch of a solution rather than an exact solution I would use this general approach

SELECT * FROM   ACTOR WHERE  id IN ( SELECT id         /* ... of actors that have worked on a film worked           on by actors that have worked on a KB film*/ EXCEPT SELECT id  /* ... of all actors that have worked on a KB film          including KB himself*/ ) 

Also as you are not allowed to use recursive CTEs anyway here's an answer using those.

WITH RecursiveCTE      AS (SELECT C.pid,                 C.mid,                 0 as Level          FROM   CASTS C                 JOIN ACTOR A                   ON A.id = C.pid          WHERE  A.fname = 'Kevin'                 and A.lname = 'Bacon'          UNION ALL          SELECT c1.pid,                 c2.mid,                 R.Level + 1          FROM   RecursiveCTE R                 JOIN CASTS c1                   ON c1.mid = R.mid                      AND R.Level < 2                 JOIN CASTS c2                   ON c1.pid = c2.pid) SELECT * FROM   ACTOR WHERE  id IN (SELECT pid               FROM   RecursiveCTE               GROUP  BY pid               HAVING MIN(Level) = 2)   
like image 89
Martin Smith Avatar answered Sep 28 '22 20:09

Martin Smith