SQL Server "" operator is very slow compared to "=" on table with a few million rows

Question

I have two tables. Forms has ~77000 rows. Logs has ~2.7 million rows.

The following query returns "30198" in less than a second:

SELECT COUNT(DISTINCT logs.DOCID) FROM logs, forms WHERE logs.DOCID = forms.DOCID;

And this query has been running for ~15 minutes so far, and still hasn't finished:

SELECT COUNT(DISTINCT logs.DOCID) FROM logs, forms WHERE logs.DOCID <> forms.DOCID;

Why is the "not equal" query so much slower?

Marc B · Accepted Answer

Because = reduces the join operation to one single matching row from each table (presuming those docids are unique).

Think of it this way- you've got a dance with 5 boys and 5 girls:

Adam      Alice
Bob       Betty
Charly    Cathy
Dick      Deb
Evan      Elly

You pair them up by first letter. So

Adam->Alice
Bob->Betty
etc...

One single pairing

But if you pair them up by "First letters do NOT match", you end up with:

Adam->Betty
Adam->Cathy
Adam->Deb
Adam->Elly
Bob->Alice
etc...

you've MASSIVELY increased the number of pairings. This is why your <> query is taking so long. You're essentially trying to fetch m x n rows, rather than just min(m,n). With this data, you end up with 25 rows, rather than 5. For your specified table sizes, you're working with 77,000 * 2,700,000 = 207.9 billion rows, minus 77,000 where the two ids match up, for a total of 207,899,923,000 rows in the joined data set.

given your query requirements, try a left join and look for null right-side records:

SELECT DISTINCT logs.DOCID
FROM logs
LEFT JOIN forms ON logs.DOCID = forms.DOCID
WHERE forms.DOCID IS NULL

SQL Server "<>" operator is very slow compared to "=" on table with a few million rows

Tags:

performance

sql

sql-server

Tom Redman

1 Answers

Marc B

Recent Activity

Donate For Us