I am trying to stick to the practice of keeping the database normalized, but that leads to the need to run multiple join queries. Is there a performance degradation if many queries use joins vs having a call to a single table that might contain redundant data?
Even though the join order has no impact on the final result, it still affects performance. The optimizer will therefore evaluate all possible join order permutations and select the best one. That means that just optimizing a complex statement might become a performance problem.
If the joining column is UNIQUE and marked as such, both these queries yield the same plan in SQL Server . If it's not, then IN is faster than JOIN on DISTINCT .
Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. There's an example of this in the subqueries lesson. Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.
The problem is joins are relatively slow, especially over very large data sets, and if they are slow your website is slow. It takes a long time to get all those separate bits of information off disk and put them all together again.
Keep the Database normalised UNTIL you have discovered a bottleneck. Then only after careful profiling should you denormalise.
In most instances, having a good covering set of indexes and up to date statistics will solve most performance and blocking issues without any denormalisation.
Using a single table could lead to worse performance if there are writes as well as reads against it.
Michael Jackson (not that one) is famously believed to have said,
That was probably before RDBMSs were around, but I think he'd have extended the Rules to include them.
Multi-table SELECTs are almost always needed with a normalised data model; as is often the case with this kind of question, the "correct" answer to the "denormalise?" question depends on several factors.
DBMS platform.
The relative performance of multi- vs single-table queries is influenced by the platform on which your application lives: the level of sophistication of the query optimisers can vary. MySQL, for example, in my experience, is screamingly fast on single-table queries but doesn't optimise queries with multiple joins so well. This isn't a real issue with smaller tables (less than 10K rows, say) but really hurts with large (10M+) ones.
Data volume
Unless you're looking at tables in the 100K+ row region, there pretty much shouldn't be a problem. If you're looking at table sizes in the hundreds of rows, I wouldn't even bother thinking about indexing.
(De-)normalisation
The whole point of normalisation is to minimise duplication, to try to ensure that any field value that must be updated need only be changed in one place. Denormalisation breaks that, which isn't much of a problem if updates to the duplicated data are rare (ideally they should never occur). So think very carefully before duplicating anything but the most static data, Note that your database may grow significantly
Requirements/Constraints
What performance requirements are you trying to meet? Do you have fixed hardware or a budget? Sometimes a performance boost can be most easily - and even most cheaply - achieved by a hardware upgrade. What transaction volumes are you expecting? A small-business accounting system has a very different profile to, say, Twitter.
One last thought strikes me: if you denormalise enough, how is your database different from a flat file? SQL is superb for flexible data and multi-dimensional retieval, but it can be an order of magnitude (at least) slower than a straight sequential or fairly simply indexed file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With