Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I expect a performance gain from removing this JOIN?

Tags:

join

mysql

I have a "items" table with 1 million rows and a "users" table with 20,000 rows. When I select from the "items" table I do a join on the "users" table (items.user_id = user.id), so that I can grab the "username" from the users table.

I'm considering adding a username column to the items table and removing the join. Can I expect a decent performance increase from this? It's already quite fast, but it would be nice to decrease my load (which is pretty high).

The downside is that if the user changes their username, items will still reflect their old username, but this is okay with me if I can expect a decent performance increase.

I'm asking stackoverflow because benchmarks aren't telling me too much. Both queries finish very quickly. Regardless, I'm wondering if removing the join would lighten load on the database to any significant degree.

Example query with join:
SELECT Item.id, Item.submitter_id, Item.source_image, Item.cached_image, Item.source_title, Item.source_url, Item.width, Item.height, Item.status, Item.popular, Item.made_popular, Item.fave_count, Item.tags, Item.user_art, Item.nudity, Item.created, Item.modified, Item.removed, Item.nofront, Item.test, Item.recs, Item.recs_data, User.id, User.username, User.password, User.email, User.fullname, User.profileurl, User.homepage, User.bio, User.location, User.avatar, User.ff_user, User.ff_key, User.ff_last_faveid, User.twitter_user, User.twitter_pass, User.emailalerts, User.showunsafe, User.view, User.fb_uid, User.fb_session, User.fb_avatar, User.twitter_uid, User.twitter_data, User.twitter_autopost, User.uri, User.created, User.modified FROM items AS Item LEFT JOIN users AS User ON (Item.submitter_id = User.id) WHERE Item.nofront != 1 AND Item.removed != 1 AND Item.made_popular is not NULL AND nudity != 1 ORDER BY Item.made_popular DESC LIMIT 1040, 290;

Example query without join:
SELECT Item.id, Item.submitter_id, Item.source_image, Item.cached_image, Item.source_title, Item.source_url, Item.width, Item.height, Item.status, Item.popular, Item.made_popular, Item.fave_count, Item.tags, Item.user_art, Item.nudity, Item.created, Item.modified, Item.removed, Item.nofront, Item.test, Item.recs, Item.recs_data FROM items AS Item WHERE Item.nofront != 1 AND Item.removed != 1 AND Item.made_popular is not NULL AND nudity != 1 ORDER BY Item.made_popular DESC LIMIT 1040, 290;

like image 846
makeee Avatar asked Apr 08 '10 03:04

makeee


2 Answers

The right answer is to measure it, in the target environment, to see if it makes a difference. Then do a cost/benefit analysis to see if it's worth it.

The cost is the added storage and possibility of data getting out of sync (but see below on how to mitigate this). The benefit is the increased speed or reduced load.

Database schemas are not set-and-forget operations, they should be tuned periodically as the underlying data changes. That's what DBAs are paid for, continual monitoring and tuning.

In any case, the duplication of a column can be controlled quite easily in a decent DBMS by the use of triggers. By that, I mean place an insert/update trigger on the users table so that, if a user changes their user name, it's also changed in the items table (and possibly vice versa).

Whether MySQL meets my definition of a decent DBMS, I can't comment - I'm a DB2 bod myself. But reversion from third normal form is a tried and tested technique for wringing every last ounce of performance out of databases and, provided you understand the consequences, is quite acceptable. Very few people complain about their databases taking up too much disk space. Many complain about how slow their queries run.

Just keep in mind that reversion is something you do if and when you have a performance problem. It's not something that should be done just because you think it may reduce the load. Unless the load (or time taken) is actually a problem, the benefit part of your cost/benefit analysis is zero, so any sane bean counter will tell you that means "no change".


Based on your added queries, I have a couple of points to make:

  • First, the nudity column. Please tell me how I can get access to this database :-)
  • You should only extract the columns you need. If the user name is all you require from the User table, you should not be getting all that extra stuff in the first query. Possibly likewise for the Item stuff - only get what you need.
  • Make sure you have indexes on all columns used in WHERE clauses - this may also entail combination indexes (those with more than one column). What gets indexed depends on your queries but each column used in the WHERE clauses is a good start for analysis.
  • For large tables, you can consider "sweeping" removed items periodically into a separate table (e.g., RemovedItems) to minimise the size of Items and speed up the queries. But keep in mind this is only useful if you rarely need to look for romoved items, since it will complicate those queries (by forcing them to search in two tables instead of one). Again, this is a cost/benefit thing. One million rows is not really that big a table (at least in my world).
like image 84
paxdiablo Avatar answered Nov 07 '22 01:11

paxdiablo


I suggest that you keep it that way to preserve a normalized table. I thinks its not a good idea to put usernames on the item table since it will make data redundant. Have you tried re-indexing your table?

like image 1
hallie Avatar answered Nov 07 '22 00:11

hallie