What are some of the methods/techniques experienced SQL developers use to determine if a particular SQL query will scale well as load increases, rows in associated tables increase etc.
Some rules that I follow that make the most difference.
Don't use per-row functions in your queries like if
, case
, coalesce
and so on. Work around them by putting data in the database in the format you're going to need it, even if that involves duplicate data.
For example, if you need to lookup surnames fast, store them in the entered form and in their lowercase form, and index the lowercase form. Then you don't have to worry about things like select * from tbl where lowercase(surname) = 'smith';
Yes, I know that breaks 3NF but you can still guarantee data integrity by judicious use of triggers or pre-computed columns. For example, an insert/update trigger on the table can force the lower_surname
column to be set to the lowercase version of surname
.
This moves the cost of conversion to the insert/update
(which happens infrequently) and away from the select
(which happens quite a lot more). You basically amortise the cost of conversion.
Make sure that every column used in a where
clause is indexed. Not necessarily on its own but at least as the primary part of a composite key.
Always start off in 3NF and only revert if you have performance problems (in production). 3NF is often the easiest to handle and reverting should only be done when absolutely necessary.
Profile, in production (or elsewhere, as long as you have production data and schemas). Database tuning is not a set-and-forget operation unless the data in your tables never changes (very rare). You should be monitoring, and possibly tuning, periodically to avoid the possibility that changing data will bring down performance.
Don't, unless absolutely necessary, allow naked queries to your database. Try to control what queries can be run. Your job as a DBA will be much harder if some manager can come along and just run:
select * from very_big_table order by column_without_index;
on your database.
If managers want to be able to run ad-hoc queries, give them a cloned DBMS (or replica) so that your real users (the ones that need performance) aren't affected.
Don't use union
when union all
will suffice. If you know that there can be no duplicates between two selects of a union, there's no point letting the DBMS try to remove them.
Similarly, don't use select distinct
on a table if you're retrieving all the primary key columns (or all columns in a unique constraint). There is no possibility of duplicates in those cases so, again, you're asking the DBMS to do unnecessary work.
Example: we had a customer with a view using select distinct *
on one of their tables. Querying the view took 50 seconds. When we replaced it with a view starting select *
, the time came down to sub-second. Needless to say, I got a good bottle of red wine out of that :-)
Try to avoid select *
as much as possible. In other words, only get the columns you need. This makes little difference when you're using MySQL on your local PC but, when you have an app in California querying a database in Inner Mongolia, you want to minimise the amount of traffic being sent across the wire as much as possible.
don't make tables wide, keep them narrow as well as the indexes. Make sure that queries are fully covered by indexes and that those queries are SARGable.
Test with a ton of data before going in production, take a look at this: Your testbed has to have the same volume of data as on production in order to simulate normal usage
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With