I'm pretty much looking for a way to filter out rows from a SELECT
of one table based on certain values in rows of another table.
I'm experimenting with the example structure below. I've got a table of blog post content (one row per blog post), and another table of metadata about the posts (one row per key-value pair; each row with a column associating it with a blog post; many rows per blog post). I want to pull a row of posts
only if there exists no rows in metadata
where metadata.pid=posts.pid AND metadata.k='optout'
. That is, for the example structure below, I just want to get back the posts.id=1
row.
(Based on what I've tried) JOIN
s don't end up removing the posts which have some metadata where metadata.k='optout'
, because the other row of metadata for that pid
means it makes it into the results.
mysql> select * from posts;
+-----+-------+--------------+
| pid | title | content |
+-----+-------+--------------+
| 1 | Foo | Some content |
| 2 | Bar | More content |
| 3 | Baz | Something |
+-----+-------+--------------+
3 rows in set (0.00 sec)
mysql> select * from metadata;
+------+-----+--------+-----------+
| mdid | pid | k | v |
+------+-----+--------+-----------+
| 1 | 1 | date | yesterday |
| 2 | 1 | thumb | img.jpg |
| 3 | 2 | date | today |
| 4 | 2 | optout | true |
| 5 | 3 | date | tomorrow |
| 6 | 3 | optout | true |
+------+-----+--------+-----------+
6 rows in set (0.00 sec)
A subquery can give me the inverse of what I want:
mysql> select posts.* from posts where pid = any (select pid from metadata where k = 'optout');
+-----+-------+--------------+
| pid | title | content |
+-----+-------+--------------+
| 2 | Bar | More content |
| 3 | Baz | Something |
+-----+-------+--------------+
2 rows in set (0.00 sec)
...but using pid != any (...)
gives me all 3 of the rows in posts, cause every single pid
has a metadata row where k!='optout'
.
Sounds like you want to do a LEFT JOIN
and then check for results in which the value of the joined table is NULL
, indicating that no such joined record exists.
For example:
SELECT * FROM posts
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = 'optout')
WHERE metadata.mdid IS NULL;
This will select any row from the table posts
for which no corresponding metadata
row exists with a value of k = 'optout'
.
edit: Worth noting that this is a key property of a left join and would not work with a regular join; a left join will always return values from the first table, even if no matching values exist in the joined table(s), allowing you to perform selections based on the absence of those rows.
edit 2: Let's clarify what's happening here with respect to the LEFT JOIN
versus the JOIN
(which I refer to as an INNER JOIN
for clarity but is interchangable in MySQL).
Suppose you run either of these two queries:
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
FROM posts
INNER JOIN metadata ON posts.pid = metadata.pid;
or
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
FROM posts
LEFT JOIN metadata ON posts.pid = metadata.pid;
Both queries produce the following result set:
+-----+-------+--------------+------+-------+-----------+
| pid | title | content | mdid | k | v |
+-----+-------+--------------+------+-------+-----------+
| 1 | Foo | Some content | 1 | date | yesterday |
| 1 | Foo | Some content | 2 | thumb | img.jpg |
+-----+-------+--------------+------+-------+-----------+
Now, let's suppose we modify the query to add the extra criteria for "optout" that was mentioned. First, the INNER JOIN
:
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
FROM posts
INNER JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = "optout");
As expected, this returns no results:
Empty set (0.00 sec)
Now, changing that to a LEFT JOIN
:
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
FROM posts
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = "optout");
This DOES produce a result set:
+-----+-------+--------------+------+------+------+
| pid | title | content | mdid | k | v |
+-----+-------+--------------+------+------+------+
| 1 | Foo | Some content | NULL | NULL | NULL |
+-----+-------+--------------+------+------+------+
The difference between an INNER JOIN
and a LEFT JOIN
is that an INNER JOIN
will only return a result if rows from BOTH joined tables match. In a LEFT JOIN
, matching rows from the first table will ALWAYS be returned, regardless of whether anything is found to join to. In a lot of cases it doesn't matter which one you use, but it's important to choose the right one so as not to get unexpected results down the line.
So in this case, the suggested query of:
SELECT posts.*, metadata.mdid, metadata.k, metadata.v
LEFT JOIN metadata ON (posts.pid = metadata.pid AND metadata.k = 'optout')
WHERE metadata.mdid IS NULL;
Will return the same result set as above:
+-----+-------+--------------+------+------+------+
| pid | title | content | mdid | k | v |
+-----+-------+--------------+------+------+------+
| 1 | Foo | Some content | NULL | NULL | NULL |
+-----+-------+--------------+------+------+------+
Hopefully that clears it up! Joins are a great thing to learn about, having a full understanding of when to use which one is a very good thing.
You can try something like
select p.*
from posts p
where NOT EXISTS (
select pid
from metadata
where k = 'optout'
and pid = p.pid
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With