Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL JOIN Abuse? How bad can it get?

I've been reading a lot about Relational Databases using many JOIN statements on every SELECT. However, I've been wondering if there's any performance problem on the long run when abusing this method.

For example, lets say we have a users table. I would usually add the "most used" data, instead of doing any extra JOINs. When I say the "most used" data, for instance, would be the username, display picture and location.

This data would always be needed when displaying any user interaction on the website, example: on every comments table JOIN for articles. Instead of doing a JOIN on the users & users_profiles tables to get the 'location' and 'display', just use the information on users table.

That's my approach, however I do know that there are a lot of excellent and experienced programmers that can give me a word of advice about this matter.

My questions are:

Should I try to be conservative with the JOINs? or should I use them more? Why?

Are there any performance problems on the long run when using JOIN a lot?

Note: I must clarify, that I'm not trying to avoid JOINS at all. I use them only when needed. On this example would be comment/article authors, extra profile information that only displays on user profiles pages... etc.

like image 540
MarioRicalde Avatar asked Dec 05 '09 10:12

MarioRicalde


People also ask

Can MySQL handle millions of records?

Millions of rows is fine, tens of millions of rows is fine - provided you've got an even remotely decent server, i.e. a few Gbs of RAM, plenty disk space. You will need to learn about indexes for fast retrieval, but in terms of MySQL being able to handle it, no problem. Save this answer. Show activity on this post.

Why are joins bad in SQL?

JOINs in SQL are described as evil by junior developers who don't understand relational database management systems. JOINs incur a CPU cost, which is greatly emphasized when the data volume grows large and there are no indexes on the columns participating in the JOIN operations.

What is the most efficient way of joining 2 table in same database?

Method 1: Relational Algebra Relational algebra is the most common way of writing a query and also the most natural way to do so. The code is clean, easy to troubleshoot, and unsurprisingly, it is also the most efficient way to join two tables.

Why are joins bad?

The problem is joins are relatively slow, especially over very large data sets, and if they are slow your website is slow. It takes a long time to get all those separate bits of information off disk and put them all together again.


1 Answers

My advice on data modeling is:

  • You should favour optional (nullable) columns over 1:1 joins generally speaking. There are still instances where 1:1 makes sense, usually revolving around subtyping. People tend to be more squeamish when it comes to nullable columns than they do about joins oddly;
  • Don't make a model too indirect unless really justified (more on this below);
  • Favour joins over aggregation. This can vary so it needs to be tested. See Oracle vs MySQL vs SQL Server: Aggregation vs Joins for an example of this;
  • Joins are better than N+1 selects. An N+1 select is, for example, selecting an order from a database table and then issuing a separate query to get all the line items for that order;
  • The scalability of joins is usually only an issue when you're doing mass selects. If you select a single row and then join that to a few things rarely is this a problem (but sometimes it is);
  • Foreign keys should always be indexed unless you're dealing with a trivially small table;

More in Database Development Mistakes Made by AppDevelopers.

Now as for directness of a model, let me give you an example. Let's say you're designing a system for authentication and authorization of users. An overengineered solution might look something like this:

  • Alias (id, username, user_id);
  • User (id, ...);
  • Email (id, user_id, email address);
  • Login (id, user_id, ...)
  • Login Roles (id, login_id, role_id);
  • Role (id, name);
  • Role Privilege (id, role_id, privilege_id);
  • Privilege (id, name).

So you need 6 joins to get from the username entered to the actual privileges. Sure there might be an actual requirement for this but more often than not this kind of system is put in because of the hand-wringing by some developer thinking they might someday need it even though every user only has one alias, user to login is 1:1 and so on. A simpler solution is:

  • User (id, username, email address, user type)

and, well, that's it. Perhaps if you need a complex role system but it's also quite possible that you don't and if you do it's reasonably easy to slot in (user type becomes a foreign key into a user types or roles table) or it's generally straightforward to map the old to the new.

This is thing about complexity: it's easy to add and hard to remove. Usually it's a constant vigil against unintended complexity, which is bad enough without going and making it worse by adding unnecessary complexity.

like image 99
cletus Avatar answered Oct 23 '22 23:10

cletus