Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I efficiently do the intersection of joins in SQL?

I have three tables, books, tags, and taggings (books-xref-tags):

books
id | title |      author     
 1 | Blink | Malcolm Gladwell
 2 |  1984 |    George Orwell

taggings
book_id | tag_id
      1 |      1
      1 |      2
      2 |      1
      2 |      3

tags
id | name
 1 | interesting
 2 |  nonfiction
 3 |     fiction

I'd like to search for all books tagged both "interesting" and "fiction." The best I've come up with is

select books.* from books, taggings, tags
 where taggings.book_id = books.id
   and taggings.tag_id  = tag.id
   and tag.name = "interesting"
intersect
select books.* from books, taggings, tags
 where taggings.book_id = books.id
   and taggings.tag_id  = tag.id
   and tag.name = "fiction"

That seems to work, but I'm not sure how it will scale, either in rows or number of tags. That is, what happens as I add hundreds of books, hundreds of tags, and thousands of taggings? What happens as the search becomes "'interesting' and 'fiction' and 'aquatic' and 'stonemasonry'"?

I have an alternative approach in mind if there's no better way of doing the query directly in SQL:

  1. select all books with the first tag, along with all of those books' tags
  2. remove any from the list that don't have all of the tags queried
like image 521
James A. Rosen Avatar asked Dec 04 '25 10:12

James A. Rosen


2 Answers

If you want to keep the option of using more than two tags, this answer to a similar could be interesting for you.

It uses MySQL syntax (not sure what you use), but it is quite simple and you should be able to use it with other databases.

This would look like that for you (using MySQL syntax):

SELECT books.id, books.title, books.author
FROM books
INNER JOIN taggings ON ( taggings.book_id = books.book_id )
INNER JOIN tags ON ( tags.tag_id = taggings.tag_id )
WHERE tags.name IN ( @tag1, @tag2, @tag3 )
GROUP BY books.id, books.title, books.author
HAVING COUNT(*) = @number_of_tags

From my other post:

If you have 3 tags like in your example then number_of_tags would have to be 3, and the join would result in 3 rows per id that matches.

You can either create that query dynamically, or define it with, say, 10 tags and initialize them with a value that won't occur in tags.

like image 164
Peter Lang Avatar answered Dec 07 '25 08:12

Peter Lang


I would recommend the ALL instead of intersect since mysql actually knows how to join this much better though I lack the proper benchmarks.

select books.* from books, taggings, tags
 where taggings.book_id = books.id
   and taggings.tag_id  = tag.id
   and tag.name ALL("interesting", "fiction");

As to it's scaling, with millions of books and a low cardinality on the tags table what you're going to end up doing is migrating the tag id into the code/memory so that you're using taggings.tag_id ALL(3, 7, 105) or something. That last join to get the tags table on isn't going to use an index unless you get over like 1k tags so you're going to do a table scan each time.

In my experience joins, intersections and unions are huge evils for performance. Mostly joins are the problem we commonly experience. The less joins you have, the faster you'll end up getting.

like image 27
Chuck Vose Avatar answered Dec 07 '25 08:12

Chuck Vose



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!