Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TempDB performance crawling; should we reboot?

A bit of background: We have 17 different TempDB database files and 6 TempDB log files on the server. These are spread out on different drives, but are hosted on 2 drive arrays.

I’m seeing Disk IO response times exceeding the recommended limits. Typically you want your disks to respond in 5-10ms, with nothing going over 200ms. We’re seeing random spikes up to 800ms on the TempDB files, but only on one drive array.

Proposed solution: Restart SQL server. While SQL server is shut down, reboot the drive array hosting the majority of the TempDB files. In addition, while SQL is down, redo the network connection to bypass the network switch in an attempt to eliminate any source of slowness on the hardware.

Is this a good idea or a shot in the dark? Any ideas? Thanks in advance.

like image 239
Byron Whitlock Avatar asked Dec 27 '22 10:12

Byron Whitlock


1 Answers

17? Who came up with that number? Please read this and this - very few scenarios where > 8 files will help, particularly if you only have 2 underlying arrays/controllers. Some suggestions:

  1. Use an even number of files. Most folks start with 4 or 8, and increase beyond that only when they've proven that they still have contention (and also know that their underlying I/O can actually handle more files and scale with them; in some cases it will have no effect or the exact opposite effect - a different drive letter does not necessarily mean better I/O pathing).
  2. Make sure all of the data files are sized the same, and have identical autogrow settings. Having 17 files with different sizes and autogrowth settings will defeat round robin - in a lot of cases only one file will be used due to the way SQL Server performs proportional fill. And having an odd number just seems... well, odd to me.
  3. Get rid of your 5 extra log files. They are absolutely useless.
  4. Use trace flag 1117 to make sure that all the data files grow at the same time and (because of 2.) at the same rate. Note though that this trace flag applies to all databases, not just tempdb. More info here.
  5. You can also consider trace flag 1118 to change allocation, but please read this first.
  6. Make sure instant file initialization is on, so that the file doesn't have to be zeroed out when it expands.
  7. Pre-size your tempdb files so that they don't have to grow during normal day-to-day activity. Do not shrink tempdb files because they suddenly got big - this is just a rinse and repeat operation, since if they got that big once, they'll get that big again. It's not like you can lease out the recovered space in the meantime.
  8. When possible, perform DBCC CHECKDB elsewhere. If you're running CHECKDB regularly, yay! Pat yourself on the back. However this can take a toll on tempdb - please see this article on optimizing this operation and pulling it away from your production instance where feasible.
  9. Finally, validate what type of contention you're seeing. You say that tempdb performance crawls, but in what way? How are you measuring this? Some info on determining the exact nature of tempdb bottlenecks here and here and here and here and here.

Have you considered making less use of tempdb explicitly (fewer #temp tables, @table variables, and static cursors - or cursors altogether)? Are you making heavy use of RCSI, or MARS, or LOB-type local variables?

like image 77
Aaron Bertrand Avatar answered Jan 11 '23 06:01

Aaron Bertrand