Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow running query, CF 9 & MSSQL 2008; corrupt execution plan?

I've been working on Coldfusion/MS SQL for years and this is one of the strangest problems I've seen. The problem itself has since been resolved but I don't really understand what happened; this question is attempting to get a clear idea of the likely cause.

The Problem

In a stable production environment, for no apparent reason, one query starts returning in around 1,000-1,500 ms (about 10 times slower than usual). I was able to isolate it to this:

<cfquery datasource="#ds#" name="query">
   select 1
   from eLearning.v_courseCompletion cc
   where 
      cc.memberIncId = <cfqueryparam value="3" cfsqltype="cf_sql_integer"> and
      cc.courseId = <cfqueryparam value="25" cfsqltype="cf_sql_integer"> and 
      cc.currentCourseCompleted = 1
</cfquery>

What's strange is, this behaviour is aggravated when within a loop, even when there's a single iteration, like in this example:

<cfloop from="1" to="1" index="i">
   <cfquery datasource="#ds#" name="query">
      select 1
      from eLearning.v_courseCompletion cc
      where 
         cc.memberIncId = <cfqueryparam value="3" cfsqltype="cf_sql_integer"> and
         cc.courseId = <cfqueryparam value="25" cfsqltype="cf_sql_integer"> and 
         cc.currentCourseCompleted = 1
   </cfquery>
</cfloop>

This should be exactly the same as above, right? The loop should have no effect but instead, this test runs about 10 times slower, returning in between 7,000-16,000 ms. This is how the problem was detected; the query (buried within a object method) was being called from the body of a loop, if the loop iterated more than 5 or 6 times the request would timeout.

To me, this indicated a problem on the Coldfusion side but restarting the service, or indeed the machine, did nothing.

Meanwhile, once isolated, I noticed that making any change to the query itself caused the performance to return to the expected level, around 150-190 ms. For example:

  • Changing the selected fields (ie. select *)
  • Removing the table alias (cc)
  • Replacing either <cfqueryparam> with an inline value
  • Removing any of the conditions

Any of these changes "fixed" the problem but, when running the original query, the performance problem would return.

The Solution

At this point I guessed the query's execution plan had been corrupted or something, did some Googling, and ran DBCC FREEPROCCACHE against the DB server. This fixed the problem immediately. Great, problem solved....

The Question

Since then though, I've done a bit more research and the consensus seems to be that execution plans "don't get corrupted". There's a some talk of similar problems occurring with stored procedures and parameter sniffing but I'm not using any sp's here. We are selecting from a fairly complicated view though (eLearning.v_courseCompletion) with nested joins. Is that the issue?

Basically, what actually happened here? How to I stop it from happening again?

.. and what the hell is the connection to loops in CF?!?

Versions

  • Coldfusion 9.0.2.282541 (64 bit)
  • SQL Server Express 10.50.4297 (64 bit)
  • Both servers are Win Server 2008 R2 Datacenter (64 bit)
like image 445
Molomby Avatar asked Oct 10 '14 01:10

Molomby


1 Answers

You are using a stored procedure under the hood when you use a cfqueryparam. When you don't use a cfqueryparam, your query is simply sent in as a "free text" batch query. When you use a cfqueryparam, you are sending your query in to be executed using sp_executeSQL() which itself will send your query body in as a parameter. This allows the query plan to cache. If it sees the same query, it will use the stats it has saved for that particular plan. This means, that if it ran with some really oddball data and got a bad idea for executing the query, the next iterations would all use the same plan, which is a "bad plan" for 99% of the use cases of this query, but maybe a good plan for that one oddball instance.

Each query that is executed with sp_execute SQL also returns a numeric handle that the JDBC driver can use to simply tell SQL which plan it can use, basically a shortcut. This is called "max pooled statements" in your DSN settings in CFadmin. Having that set to 0 or 1000 does not affect the fact that you will be taking advantage of a plan cache with sp_executeSQL.

http://blogs.msdn.com/b/turgays/archive/2013/09/18/exec-vs-sp-executesql.aspx

StackOverflow had a good demonstration of this, if one particular power user would load their account page with their millions of badges and points before the query stats were built, it would mess up the stats for every other user who has but a few hundred or so points and a handful of badges, making the page slow for him or her.

like image 148
J.T. Avatar answered Nov 09 '22 03:11

J.T.