Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New cardinality estimator (SQL Server 2014) is way off

I have a data warehouse database and I'm facing problems with the new cardinality estimator of SQL Server 2014.

After upgrading the database server to SQL Server 2014 I have observed a big difference in query performance. Some queries are executing much slower (30 sec in SQL 2012 vs. 5 minutes in SQL 2014). After researching execution plans I've seen that the cardinality estimates on the SQL Server 2014 are way off and I can't find a reason for it.

Here's an example of a query execution plan (top-left operator) in SQL 2012 vs. SQL 2014:

Estimed Number of Rows

Some details:

  • My queries are typical data warehouse fact table load queries. I query a transactional table and join a lot (15-20) dimension tables (there's always either 0 or 1 record that is joined from the dimensional table).

  • I have updated statistics of all tables (with FULLSCAN) to be sure that the statistics is up-to-date.

  • The business keys of the dimension tables are indexed (unique non-clusted index). It seems to me that because of the uniqueness of this index the old cardinality estimator (SQL 2012) correctly assumes that there's max. 1 record that joins (the estimated number of records does not change in the execution plan).

I tried to narrow down the issue to the simplest example – SELECT with 2 joins:

Join

Here's the cardinality estimation on operators 1 and 2 in SQL 2012 vs. SQL 2014:

           | Est.rows - SQL2012 | Est.rows - SQL2014
Operator 1 |               7653 |               7653
Operator 2 |               7653 |              10000

As you can see, SQL Server 2014 misses the estimation by more than 30% (10000 vs. 7653). Because I have cca. 15-20 joins in a typical query, the final estimate goes way off.

I can put the database in the lower compatibility mode (110) and it works fine then (same like on SQL Server 2012), but I would really like to know what is the reason for this behaviour. Why is the result of cardinality estimator of SQL Server 2014 wrong?

like image 334
saso Avatar asked Oct 23 '14 13:10

saso


People also ask

When was the cardinality estimator CE algorithms overhauled to improve estimates and deliver the quickest query results in SQL Server?

In 1998, a major update of the CE was part of SQL Server 7.0, for which the compatibility level was 70.

What is option Querytraceon 9481?

This allows users with lower permissions to change the Cardinality Estimator for execution of the problematic statement.

Why is cardinality estimation important?

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency.

What is cardinality estimate?

What is a Cardinality Estimate? A cardinality estimate is the estimated number of rows, the optimizer believes will be returned by a specific operation in the execution plan.


2 Answers

I think there is no simple answer today to this interesting question. The best answer I know is the following video: http://channel9.msdn.com/events/TechEd/NorthAmerica/2014/DBI-B331#fbid=. It has numerous examples of new and old estimators. The video is about 50+ mins long but it is worth the time.

A summary of the video that relates to this question:

Old assumptions of cardinality estimates:

  1. Uniformity – data is uniformly distributed.
  2. Independence – column 1 has no relation to column 2.
  3. Containment – when two attributes might be the same, they are assumed to be the same.
  4. Inclusion – there should be a match.

To Use SQL SERVER 2012 cardinality estimator in SQL SERVER 2014 use the following option:

  • Option (querytraceon 9481) --revert to 2012

What is new estimator doing (based on video):

  • SQL Server uses average selectivity in the index and estimates number of rows by multiplying density of the key to total numbers of rows in the index.
  • New estimator does not work very well with jagged distributions.
  • Most differences between the estimators are based on WHERE clause.
  • New cardinality estimator believes there is a correlation between the tables.
  • You can create filtered statistics to improve queries. (http://msdn.microsoft.com/en-us/library/ms188038.aspx )

To do / checklist:

1. Auto Create / Update Stats
2.  Check database compatibility mode (120/110)
3.  Test using query trace flags
4.  XML showplan

Update What's new in cardinality estimator (SQL Server 2016)

  1. The more accurate.
  2. The CE predicts how many rows your query will likely return
  3. SQL Server 2016 the query store
  4. Another option for tracking the cardinality predictions of the CE is to use the extended event named query_optimizer_estimate_cardinality
  5. CE understands maximum value might be higher than when statistics were last gathered
  6. CE understands that filtered predicates on the same table are often correlated
  7. CE no longer assumes any correlation between filtered predicates from different tables

More details:

https://docs.microsoft.com/en-us/sql/relational-databases/performance/cardinality-estimation-sql-server

https://www.sqlshack.com/query-optimizer-changes-in-sql-server-2016-explained/

like image 140
BI Dude Avatar answered Oct 21 '22 13:10

BI Dude


I wonder if you are running into this issue around multicolumn selectivity estimates:

http://www.sqlskills.com/blogs/kimberly/multi-column-statistics-exponential-backoff/

it seems that there are still some quirks with the new CE try also using TF 4137 as outlined and see if that helps.

finally make sure you are on the latest CU and are running with TF 4199 to blanket enable all query optimizer fixes as always test this in a non-production environment if possible first and be mindful of regressions in other queries when enable settings globally

like image 22
Jason Horner Avatar answered Oct 21 '22 12:10

Jason Horner