I have a simple question about the most efficient way to perform a particular join.
Take these three tables, real names have been changed to protect the innocent:
Table: animal
animal_id name ... ====================== 1 bunny 2 bear 3 cat 4 mouse
Table: tags
tag_id tag ================== 1 fluffy 2 brown 3 cute 4 small
Mapping Table: animal_tag
animal_id tag_id ================== 1 1 1 2 1 3 2 2 3 4 4 2
I want to find all animals that are tagged as 'fluffy', 'brown', and 'cute'. That is to say that the animal must be tagged with all three. In reality, the number of required tags can vary, but should be irrelevant for this discussion. This is the query I came up with:
SELECT * FROM animal
JOIN (
SELECT at.animal_id FROM animal_tag at
WHERE at.tag_id IN (
SELECT tg.tag_id FROM tag tg
WHERE tg.tag='fluffy' OR tg.tag='brown' OR tg.tag='cute'
)
GROUP BY at.animal_id HAVING COUNT(at.tag_id)=3
) AS jt
ON animal.animal_id=jt.animal_id
On a table with thousands 'animals' and and hundreds of 'tags', this query performs respectably ... 10s of milliseconds. However, when i look at the query plan (Apache Derby is the DB), the optimizer's estimated cost is pretty high (9945.12) and the plan pretty extensive. For a query this "simple" I usually try to get query plans with an estimated cost of single or double digits.
So my question is, is there a better way to perform this query? Seems like a simple query, but I've been stumped coming up with anything better.
You could create a temp table using DECLARE GLOBAL TEMPORARY TABLE And then do an INNER JOIN to eliminate the "WHERE IN". Working with Joins which are set based is usually far more efficient than Where statements that have to be evaluated for each row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With