Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would be an optimal SQL query to retrieve the following result set

First of all, this is a homework assignment. I have already thought of and written a solution, but I am a bit aggravated by the fact that I am unable to profile it successfully or obtain a second opinion as to whether it is in fact a good one.

Say I have a simple table of film participation information (person, film, persons relationship to the film) like the following:

create table film
(
    person_name varchar(48) not null,
    film_title varchar(128) not null,
    relation varchar(48) not null
);

-- { 'Mel Gibson', 'Braveheart', 'director' }
-- { 'Mel Gibson', 'Braveheart', 'cast' }
-- { 'Steven Spielberg', 'A.I.' , 'director' }
-- { 'Hilary Swank', 'Million Dollar Baby', 'cast' }
-- etc

The database and the table are not created or maintained by me, I am simply to query information from it.

I need to produce a set of person names for directors who are cast (act in front of the camera) in each and every film they have directed. The condition should not hold for people who have directed at least one film in which they didn't act, or for people who haven't directed anything. Whether these directors also have relationships with films they didn't direct, is of no consequence here. Basically, the query can be expressed as "get me list of people who acted in every film they have directed".

My query (that as far as I can attest produces correct result set), lo and behold:

(   
    select  person_name 
    from    film 
    where   relation = 'director'
)
except 
(
    select person_name
    from 
    (   
        (
            select  person_name, film_title 
            from    film 
            where   relation = 'director'
        ) 
        except 
        (   
            select  person_name, film_title 
            from    film 
            where   relation = 'cast'
        )
    ) as director_behind_camera_for_film
)

I would like to know whether the query is sound, or if I have been thinking completely wrong about this? If the latter, could you please provide me with a better solution or an explanation thereof?

Don't pay much attention to the fact that I use strings everywhere (where surrogate keys could have been used insted) - this is a simplified example, yet it still demonstrates my challenge.

like image 325
amn Avatar asked Nov 01 '11 22:11

amn


People also ask

Which type of SQL statement returns a result-set?

The SQL SELECT statement returns a result set of records, from one or more tables. A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used data manipulation language (DML) command.

Which SQL function is used to sort the result-set?

The ORDER BY command is used to sort the result set in ascending or descending order.

What are the things used to optimize the query?

Query optimization is the overall process of choosing the most efficient means of executing a SQL statement. SQL is a nonprocedural language, so the optimizer is free to merge, reorganize, and process in any order. The database optimizes each SQL statement based on statistics collected about the accessed data.


1 Answers

SELECT tmp.person_name FROM
(
   SELECT person_name, film_title, COUNT(relationship) as cnt
   FROM film
   WHERE relationship IN ('cast', 'director')
   GROUP BY person_name, film_title
) as tmp
GROUP BY person_name
HAVING SUM(cnt) = COUNT(cnt)*2

or

SELECT tmp.person_name FROM
(
   SELECT person_name, film_title, COUNT(DISTINCT(relationship)) as cnt
   FROM film
   WHERE relationship IN ('cast', 'director')
   GROUP BY person_name, film_title
) as tmp
GROUP BY person_name
HAVING SUM(cnt) = COUNT(cnt)*2
like image 177
frail Avatar answered Jan 03 '23 13:01

frail