Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

removing rows from a SELECT based on columns in a different table

I'm pretty much looking for a way to filter out rows from a SELECT of one table based on certain values in rows of another table.

I'm experimenting with the example structure below. I've got a table of blog post content (one row per blog post), and another table of metadata about the posts (one row per key-value pair; each row with a column associating it with a blog post; many rows per blog post). I want to pull a row of posts only if there exists no rows in metadata where metadata.pid=posts.pid AND metadata.k='optout'. That is, for the example structure below, I just want to get back the posts.id=1 row.

(Based on what I've tried) JOINs don't end up removing the posts which have some metadata where metadata.k='optout', because the other row of metadata for that pid means it makes it into the results.

mysql> select * from posts;
+-----+-------+--------------+
| pid | title | content      |
+-----+-------+--------------+
|   1 | Foo   | Some content |
|   2 | Bar   | More content |
|   3 | Baz   | Something    |
+-----+-------+--------------+
3 rows in set (0.00 sec)

mysql> select * from metadata;
+------+-----+--------+-----------+
| mdid | pid | k      | v         |
+------+-----+--------+-----------+
|    1 |   1 | date   | yesterday |
|    2 |   1 | thumb  | img.jpg   |
|    3 |   2 | date   | today     |
|    4 |   2 | optout | true      |
|    5 |   3 | date   | tomorrow  |
|    6 |   3 | optout | true      |
+------+-----+--------+-----------+
6 rows in set (0.00 sec)

A subquery can give me the inverse of what I want:

mysql> select posts.* from posts where pid = any (select pid from metadata where k = 'optout');
+-----+-------+--------------+
| pid | title | content      |
+-----+-------+--------------+
|   2 | Bar   | More content |
|   3 | Baz   | Something    |
+-----+-------+--------------+
2 rows in set (0.00 sec)

...but using pid != any (...) gives me all 3 of the rows in posts, cause every single pid has a metadata row where k!='optout'.

like image 920
alxndr Avatar asked Jan 12 '11 06:01

alxndr


2 Answers

Sounds like you want to do a LEFT JOIN and then check for results in which the value of the joined table is NULL, indicating that no such joined record exists.

For example:

SELECT * FROM posts 
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = 'optout')
WHERE metadata.mdid IS NULL;

This will select any row from the table posts for which no corresponding metadata row exists with a value of k = 'optout'.

edit: Worth noting that this is a key property of a left join and would not work with a regular join; a left join will always return values from the first table, even if no matching values exist in the joined table(s), allowing you to perform selections based on the absence of those rows.

edit 2: Let's clarify what's happening here with respect to the LEFT JOIN versus the JOIN (which I refer to as an INNER JOIN for clarity but is interchangable in MySQL).

Suppose you run either of these two queries:

SELECT posts.*, metadata.mdid, metadata.k, metadata.v 
FROM posts 
INNER JOIN metadata ON posts.pid = metadata.pid;

or

SELECT posts.*, metadata.mdid, metadata.k, metadata.v 
FROM posts 
LEFT JOIN metadata ON posts.pid = metadata.pid;

Both queries produce the following result set:

+-----+-------+--------------+------+-------+-----------+
| pid | title | content      | mdid | k     | v         |
+-----+-------+--------------+------+-------+-----------+
|   1 | Foo   | Some content |    1 | date  | yesterday |
|   1 | Foo   | Some content |    2 | thumb | img.jpg   |
+-----+-------+--------------+------+-------+-----------+

Now, let's suppose we modify the query to add the extra criteria for "optout" that was mentioned. First, the INNER JOIN:

SELECT posts.*, metadata.mdid, metadata.k, metadata.v 
FROM posts 
INNER JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = "optout");

As expected, this returns no results:

Empty set (0.00 sec)

Now, changing that to a LEFT JOIN:

SELECT posts.*, metadata.mdid, metadata.k, metadata.v 
FROM posts 
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = "optout");

This DOES produce a result set:

+-----+-------+--------------+------+------+------+
| pid | title | content      | mdid | k    | v    |
+-----+-------+--------------+------+------+------+
|   1 | Foo   | Some content | NULL | NULL | NULL |
+-----+-------+--------------+------+------+------+

The difference between an INNER JOIN and a LEFT JOIN is that an INNER JOIN will only return a result if rows from BOTH joined tables match. In a LEFT JOIN, matching rows from the first table will ALWAYS be returned, regardless of whether anything is found to join to. In a lot of cases it doesn't matter which one you use, but it's important to choose the right one so as not to get unexpected results down the line.

So in this case, the suggested query of:

SELECT posts.*, metadata.mdid, metadata.k, metadata.v 
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = 'optout')
WHERE metadata.mdid IS NULL;

Will return the same result set as above:

+-----+-------+--------------+------+------+------+
| pid | title | content      | mdid | k    | v    |
+-----+-------+--------------+------+------+------+
|   1 | Foo   | Some content | NULL | NULL | NULL |
+-----+-------+--------------+------+------+------+

Hopefully that clears it up! Joins are a great thing to learn about, having a full understanding of when to use which one is a very good thing.

like image 148
futureal Avatar answered Oct 25 '22 22:10

futureal


You can try something like

select  p.* 
from    posts p
where   NOT EXISTS (
                        select  pid 
                        from    metadata 
                        where   k = 'optout' 
                        and     pid = p.pid
                    )
like image 24
Adriaan Stander Avatar answered Oct 25 '22 23:10

Adriaan Stander