I'm trying to determine the best general approach for querying against joined two tables that have a lot of data, where each table has a column in the where clause. Imagine a simple schema w/ two tables:
posts
id (int)
blog_id (int)
published_date (datetime)
title (varchar)
body (text)
posts_tags
post_id (int)
tag_id (int)
With the following indexes:
posts: [blog_id, published_date]
tags: [tag_id, post_id]
We want to SELECT the 10 most recent posts on a given blog that were tagged with "foo". For the sake of this discussion, assume the blog has 10 million posts, and 1 million of those have been tagged with "foo". What is the most efficient way to query for this data?
The naive approach would be to do this:
SELECT
id, blog_id, published_date, title, body
FROM
posts p
INNER JOIN
posts_tags pt
ON pt.post_id = p.id
WHERE
p.blog_id = 1
AND pt.tag_id = 1
ORDER BY
p.published_date DESC
LIMIT 10
MySQL will use our indexes, but will still end up scanning millions of records. Is there a more efficient way to retrieve this data w/o denormalizing the schema?
You join two tables by creating a relationship in the WHERE clause between at least one column from one table and at least one column from another. The join creates a temporary composite table where each pair of rows (one from each table) that satisfies the join condition is linked to form a single row.
Reduce side joins They are the most widely used joins. Reduce side joins happen when both the tables are so big that they cannot fit into the memory. The process flow of reduce side joins is as follows: The input data is read by the mapper, which needs to be combined on the basis of the join key or common column.
A NATURAL JOIN is a JOIN operation that creates an implicit join clause for you based on the common columns in the two tables being joined. Common columns are columns that have the same name in both tables. A NATURAL JOIN can be an INNER join, a LEFT OUTER join, or a RIGHT OUTER join.
Method 1: Relational Algebra Relational algebra is the most common way of writing a query and also the most natural way to do so. The code is clean, easy to troubleshoot, and unsurprisingly, it is also the most efficient way to join two tables.
Any filters you want to do on a joined table should go in the join. Technically, the WHERE clause should contain only conditions that require more than 1 table or the primary table. While it may not speed up all queries, it assures MySQL optimizes the query properly.
FROM
posts p
INNER JOIN
posts_tags pt
ON pt.post_id = p.id
AND pt.tag_id = 1
WHERE
p.blog_id = 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With