I've found a similar thread but it doesn't really capture the essence of what I'm trying to ask - so I've created a new thread.
I know there is a trade-off between normalization and performance, and I'm wondering what's the best practice for drawing that line? In my particular situation, I have a messaging system that has three distinct tables: messages_threads (overarching message holder), messages_recipients (who is involved), and messages_messages (the actual messages + timestamps).
In order to return the "inbox" view, I have to left join the messages_threads table, users table, and pictures tables to the messages_recipients tables in order to get the information to populate the view (profile picture, sender name, thread id)... and I've still got add a join to messages to retrieve the text from the last message in order to display a "preview" of the last message to the user.
My question is: How costly are JOINS in SQL to performance? I could, for instance, store the sender's name (which I have to left join from users to retrieve) under a field in the messages_threads table called "sendername" - but in terms of normalization I've always been taught to avoid data redundancy?
Where do you draw the line? Or am I overestimating how performance-hampering SQL joins are?
It's not possible, or useful, to answer a question about how costly joins are. A join is just a command in the SQL query, what the database does with that join is something completely different. What's expensive in a query is things like table scans, where the database has to read an entire table to locate some data.
Basically, join order DOES matter because if we can join two tables that will reduce the number of rows needed to be processed by subsequent steps, then our performance will improve.
INNER JOIN − returns rows when there is a match in both tables. Join (Inner, Left, Right and Full Joins) A SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. Different types of Joins are: INNER JOIN. Therefore its more expensive.
“Is there a performance difference between putting the JOIN conditions in the ON clause or the WHERE clause in MySQL?” No, there's no difference. The following queries are algebraically equivalent inside MySQL and will have the same execution plan.
The best practice is to always start with 3NF, and then only consider denormalistion if you find a specific performance problem.
Performance is just one of the issues you have to deal with with databases. By duplicating data, you run the risk of allowing inconsistent data to be in your database, thus nullifying one of the core principles of relational databases, consistency (the C
in ACID
) a.
Yes, joins have a cost, there's no getting around that. However, the cost is usually a lot less than you'd think, and can often be swamped by other factors like network transmission times. By making sure the relevant columns are indexed properly, you can avoid a lot of those costs.
And, remember the optimisation mantra: measure, don't guess! And measure in a production-like environment. And keep measuring (and tuning) periodically - optimisation is only a set and forget operation if your schema and data never change (very unlikely).
a) Reversion for performance can usually be made safe by using triggers to maintain consistency. This will, of course, slow down your updates, but may still let your selects run faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With