I've been reading a lot about Relational Databases using many JOIN statements on every SELECT. However, I've been wondering if there's any performance problem on the long run when abusing this method.
For example, lets say we have a users
table. I would usually add the "most used" data, instead of doing any extra JOINs. When I say the "most used" data, for instance, would be the username, display picture and location.
This data would always be needed when displaying any user interaction on the website, example: on every comments
table JOIN for articles
. Instead of doing a JOIN on the users
& users_profiles
tables to get the 'location' and 'display', just use the information on users
table.
That's my approach, however I do know that there are a lot of excellent and experienced programmers that can give me a word of advice about this matter.
My questions are:
Should I try to be conservative with the JOINs? or should I use them more? Why?
Are there any performance problems on the long run when using JOIN a lot?
Note: I must clarify, that I'm not trying to avoid JOINS at all. I use them only when needed. On this example would be comment/article authors, extra profile information that only displays on user profiles pages... etc.
Millions of rows is fine, tens of millions of rows is fine - provided you've got an even remotely decent server, i.e. a few Gbs of RAM, plenty disk space. You will need to learn about indexes for fast retrieval, but in terms of MySQL being able to handle it, no problem. Save this answer. Show activity on this post.
JOINs in SQL are described as evil by junior developers who don't understand relational database management systems. JOINs incur a CPU cost, which is greatly emphasized when the data volume grows large and there are no indexes on the columns participating in the JOIN operations.
Method 1: Relational Algebra Relational algebra is the most common way of writing a query and also the most natural way to do so. The code is clean, easy to troubleshoot, and unsurprisingly, it is also the most efficient way to join two tables.
The problem is joins are relatively slow, especially over very large data sets, and if they are slow your website is slow. It takes a long time to get all those separate bits of information off disk and put them all together again.
My advice on data modeling is:
More in Database Development Mistakes Made by AppDevelopers.
Now as for directness of a model, let me give you an example. Let's say you're designing a system for authentication and authorization of users. An overengineered solution might look something like this:
So you need 6 joins to get from the username entered to the actual privileges. Sure there might be an actual requirement for this but more often than not this kind of system is put in because of the hand-wringing by some developer thinking they might someday need it even though every user only has one alias, user to login is 1:1 and so on. A simpler solution is:
and, well, that's it. Perhaps if you need a complex role system but it's also quite possible that you don't and if you do it's reasonably easy to slot in (user type becomes a foreign key into a user types or roles table) or it's generally straightforward to map the old to the new.
This is thing about complexity: it's easy to add and hard to remove. Usually it's a constant vigil against unintended complexity, which is bad enough without going and making it worse by adding unnecessary complexity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With