Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the first issues to check while optimizing an existing database?

What are the top issues and in which order of importance to look into while optimizing (performance tuning, troubleshooting) an existing (but unknown to you) database?
Which actions/measures in your previous optimizations gave the most effect (with possibly the minimum of work) ?

I'd like to partition this question into following categories (in order of interest to me):

  1. one needs to show the performance boost (improvements) in the shortest time. i.e. most cost-effective methods/actions;
  2. non-intrusive or least-troublesome most effective methods (without changing existing schemas, etc.)
  3. intrusive methods

Update:
Suppose I have a copy of a database on dev machine without access to production environment to observe stats, most used queries, performance counters, etc. in real use.
This is development-related but not DBA-related question.
Update2:
Suppose the database was developed by others and was given to me for optimization (review) before it was delivered to production.
It is quite usual to have outsourced development detached from end-users.

Besides, there is a database design paradigm that a database, in contrast to application data storage, should be a value in itself independently on specific applications that use it or on context of its use.

Update3: Thanks to all answerers! You all pushed me to open subquestion
How do you stress load dev database (server) locally?

like image 382

3 Answers

  • Create a performance Baseline (non-intrusive, use performance counters)

  • Identify the most expensive queries (non-intrusive, use SQL Profiler)

  • Identify the most frequently run queries (non-intrusive, use SQL Profiler)

  • Identify any overly complex queries, or those using slowly performing constructs or patterns. (non-intrusive to identify, use SQL Profiler and/or code inspections; possibly intrusive if changed, may require substantial re-testing)

  • Assess your hardware

  • Identify Indexes that would benefit the measured workload (non-intrusive, use SQL Profiler)

  • Measure and compare to your baseline.

  • If you have very large databases, or extreme operating conditions (such as 24/7 or ultra high query loads), look at the high end features offered by your RDBMS, such as table/index partitioning.

This may be of interest: How Can I Log and Find the Most Expensive Queries?

like image 115
Mitch Wheat Avatar answered Oct 22 '22 15:10

Mitch Wheat


If the database is unknown to you, and you're under pressure, then you may not have time for Mitch's checklist which is good best practice to monitor server health.

You also need access to production to gather real info from assorted queries you can run. Without this, you're doomed. The server load pattern is important: you can't reproduce many issue yourself on a development server because you won't use the system like an end user.

Also, focus on "biggest bang for the buck". An expensive query running once daily at 3am can be ignored. A not-so-expensive one running every second is well worth optimising. However, you may not know this without knowing server load pattern.

So, basic steps..

Assuming you're firefighting:

  • server logs
  • SQL Server logs
  • sys.sysprocesses eg ASYNC_NETWORK_IO waits

Slow response:

  • profiler, with a duration filter. What runs often and is lengthy
  • most expensive query, weighted for how often used
  • open transaction with plan
  • weighted missing index

Things you should have:

  • Backups
  • Tested restore of aforementioned backups
  • Regular index and statistic maintenance
  • Regular DBCC and integrity checks

Edit: After your update

  • Static analysis is best practices only: you can't optimise for usage. This is all you can do. This is marc_s' answer.

  • You can guess what the most common query may be, but you can't guess how much data will be written or how badly a query scales with more data

  • In many shops developers provide some support, either directly or as *3rd line"

  • If you've been given a DB for review by another team that you hand over to another team to deploy: that's odd.

like image 6
gbn Avatar answered Oct 22 '22 16:10

gbn


If you're not interested in the runtime behavior of the database, e.g. what are the most frequently executed queries and those that consume the most time, you can only do a "static" analysis of the database structure itself. That has a lot less value, really, since you can only check for a number of key indicators of bad design - but you cannot really tell much about the "dynamics" of the system being used.

Things I would check for in a database that I get as a .bak file - without the ability to collect live and actual runtime performance statistics - would be:

  1. normalization - is the table structure normalized to third normal form? (at least most of the time - there might be some exceptions)

  2. do all tables have a primary key? ("if it doesn't have a primary key, it's not a table", after all)

  3. For SQL Server: do all the tables have a good clustering index? A unique, narrow, static, and preferably ever-increasing clustered key - ideally an INT IDENTITY, and most definitely not a large compound index of many fields, no GUID's and no large VARCHAR fields (see Kimberly Tripp's excellent blog posts on the topics for details)

  4. are there any check and default constraints on the database tables?

  5. are all the foreign key fields backed up by a non-clustered index to speed up JOIN queries?

  6. are there any other, obvious "deadly sins" in the database, e.g. overly complicated views, or really badly designed tables etc.

But again: without actual runtime statistics, you're quite limited in what you can do from a "static analysis" point of view. The real optimization can only really happen when you have a workload from a regular day of operation, to see what queries are used frequently and put the most stress on your database --> use Mitch's checklist to check those points.

like image 5
marc_s Avatar answered Oct 22 '22 16:10

marc_s