Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join between mapping (junction) table with specific cardinality

I have a simple question about the most efficient way to perform a particular join.

Take these three tables, real names have been changed to protect the innocent:

Table: animal

animal_id   name   ...
======================
1           bunny
2           bear
3           cat
4           mouse

Table: tags

tag_id     tag
==================
1          fluffy
2          brown
3          cute
4          small

Mapping Table: animal_tag

animal_id   tag_id
==================
1           1
1           2
1           3
2           2
3           4
4           2

I want to find all animals that are tagged as 'fluffy', 'brown', and 'cute'. That is to say that the animal must be tagged with all three. In reality, the number of required tags can vary, but should be irrelevant for this discussion. This is the query I came up with:

SELECT * FROM animal
JOIN (
      SELECT at.animal_id FROM animal_tag at
      WHERE at.tag_id IN (
                          SELECT tg.tag_id FROM tag tg
                          WHERE tg.tag='fluffy' OR tg.tag='brown' OR tg.tag='cute'
                          )
      GROUP BY at.animal_id HAVING COUNT(at.tag_id)=3
      ) AS jt
ON animal.animal_id=jt.animal_id

On a table with thousands 'animals' and and hundreds of 'tags', this query performs respectably ... 10s of milliseconds. However, when i look at the query plan (Apache Derby is the DB), the optimizer's estimated cost is pretty high (9945.12) and the plan pretty extensive. For a query this "simple" I usually try to get query plans with an estimated cost of single or double digits.

So my question is, is there a better way to perform this query? Seems like a simple query, but I've been stumped coming up with anything better.

like image 211
brettw Avatar asked Feb 07 '12 02:02

brettw


1 Answers

You could create a temp table using DECLARE GLOBAL TEMPORARY TABLE And then do an INNER JOIN to eliminate the "WHERE IN". Working with Joins which are set based is usually far more efficient than Where statements that have to be evaluated for each row.

like image 74
Dylan Bijnagte Avatar answered Nov 02 '22 23:11

Dylan Bijnagte