I have a "items" table with 1 million rows and a "users" table with 20,000 rows. When I select from the "items" table I do a join on the "users" table (items.user_id = user.id), so that I can grab the "username" from the users table.
I'm considering adding a username column to the items table and removing the join. Can I expect a decent performance increase from this? It's already quite fast, but it would be nice to decrease my load (which is pretty high).
The downside is that if the user changes their username, items will still reflect their old username, but this is okay with me if I can expect a decent performance increase.
I'm asking stackoverflow because benchmarks aren't telling me too much. Both queries finish very quickly. Regardless, I'm wondering if removing the join would lighten load on the database to any significant degree.
Example query with join:
SELECT Item
.id
, Item
.submitter_id
, Item
.source_image
, Item
.cached_image
, Item
.source_title
, Item
.source_url
, Item
.width
, Item
.height
, Item
.status
, Item
.popular
, Item
.made_popular
, Item
.fave_count
, Item
.tags
, Item
.user_art
, Item
.nudity
, Item
.created
, Item
.modified
, Item
.removed
, Item
.nofront
, Item
.test
, Item
.recs
, Item
.recs_data
, User
.id
, User
.username
, User
.password
, User
.email
, User
.fullname
, User
.profileurl
, User
.homepage
, User
.bio
, User
.location
, User
.avatar
, User
.ff_user
, User
.ff_key
, User
.ff_last_faveid
, User
.twitter_user
, User
.twitter_pass
, User
.emailalerts
, User
.showunsafe
, User
.view
, User
.fb_uid
, User
.fb_session
, User
.fb_avatar
, User
.twitter_uid
, User
.twitter_data
, User
.twitter_autopost
, User
.uri
, User
.created
, User
.modified
FROM items
AS Item
LEFT JOIN users
AS User
ON (Item
.submitter_id
= User
.id
) WHERE Item
.nofront
!= 1 AND Item
.removed
!= 1 AND Item
.made_popular
is not NULL AND nudity != 1 ORDER BY Item
.made_popular
DESC LIMIT 1040, 290;
Example query without join:
SELECT Item
.id
, Item
.submitter_id
, Item
.source_image
, Item
.cached_image
, Item
.source_title
, Item
.source_url
, Item
.width
, Item
.height
, Item
.status
, Item
.popular
, Item
.made_popular
, Item
.fave_count
, Item
.tags
, Item
.user_art
, Item
.nudity
, Item
.created
, Item
.modified
, Item
.removed
, Item
.nofront
, Item
.test
, Item
.recs
, Item
.recs_data
FROM items
AS Item
WHERE Item
.nofront
!= 1 AND Item
.removed
!= 1 AND Item
.made_popular
is not NULL AND nudity != 1 ORDER BY Item
.made_popular
DESC LIMIT 1040, 290;
The right answer is to measure it, in the target environment, to see if it makes a difference. Then do a cost/benefit analysis to see if it's worth it.
The cost is the added storage and possibility of data getting out of sync (but see below on how to mitigate this). The benefit is the increased speed or reduced load.
Database schemas are not set-and-forget operations, they should be tuned periodically as the underlying data changes. That's what DBAs are paid for, continual monitoring and tuning.
In any case, the duplication of a column can be controlled quite easily in a decent DBMS by the use of triggers. By that, I mean place an insert/update trigger on the users table so that, if a user changes their user name, it's also changed in the items table (and possibly vice versa).
Whether MySQL meets my definition of a decent DBMS, I can't comment - I'm a DB2 bod myself. But reversion from third normal form is a tried and tested technique for wringing every last ounce of performance out of databases and, provided you understand the consequences, is quite acceptable. Very few people complain about their databases taking up too much disk space. Many complain about how slow their queries run.
Just keep in mind that reversion is something you do if and when you have a performance problem. It's not something that should be done just because you think it may reduce the load. Unless the load (or time taken) is actually a problem, the benefit part of your cost/benefit analysis is zero, so any sane bean counter will tell you that means "no change".
Based on your added queries, I have a couple of points to make:
nudity
column. Please tell me how I can get access to this database :-)User
table, you should not be getting all that extra stuff in the first query. Possibly likewise for the Item
stuff - only get what you need.WHERE
clauses - this may also entail combination indexes (those with more than one column). What gets indexed depends on your queries but each column used in the WHERE
clauses is a good start for analysis.RemovedItems
) to minimise the size of Items
and speed up the queries. But keep in mind this is only useful if you rarely need to look for romoved items, since it will complicate those queries (by forcing them to search in two tables instead of one). Again, this is a cost/benefit thing. One million rows is not really that big a table (at least in my world).I suggest that you keep it that way to preserve a normalized table. I thinks its not a good idea to put usernames on the item table since it will make data redundant. Have you tried re-indexing your table?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With