Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is COUNT(DISTINCT fieldA) slower than COUNT(DISTINCT LTRIM(RTRIM(UPPER(fieldA))))?

I am new to execution plans in SQL Server 2005, but this mystifies me.

When I run this code ...

((SELECT COUNT(DISTINCT StudentID)
 FROM vwStudentUnitSchedules AS v
 WHERE v.UnitActive = 1
   AND v.UnitOutcomeCode IS NULL
   AND v.UnitCode = su.UnitCode
   AND v.StartDate = su.StartDate
   AND v.StudentCampus = st.StudentCampus) - 1) AS ClassSize

To get class sizes, it timesout and running it generically, it takes like 30 secs

But when I run it with this slight modification ...

 ((SELECT COUNT(DISTINCT LTRIM(RTRIM(UPPER(StudentID))))
     FROM vwStudentUnitSchedules AS v
     WHERE v.UnitActive = 1
       AND v.UnitOutcomeCode IS NULL
       AND v.UnitCode = su.UnitCode
       AND v.StartDate = su.StartDate
       AND v.StudentCampus = st.StudentCampus) - 1) AS ClassSize

It runs almost instantly.

Is it because of the LTRIM() RTRIM() and UPPER() functions? Why would they make things go faster? I suppose it's because COUNT(DISTINCT is an aggregate that counts from left to right character by character? Yes StudentID is a VARCHAR(10).

Thanks

like image 456
LoftyWofty Avatar asked Nov 02 '22 12:11

LoftyWofty


1 Answers

cache of the query plan would certainly impact your speed on a second run.

Just a theory if that isn't the case perhaps its down to the trim. The select distinct is to match every string if these are shorter down to a trim its less characters maybe.

depending on you database engine also see if binary_checksum would make it any faster. if this works maybe my theory is right.

 ((SELECT COUNT(DISTINCT BINARY_CHECKSUM(LTRIM(RTRIM(UPPER(StudentID)))))
like image 127
Thomas Harris Avatar answered Nov 10 '22 19:11

Thomas Harris