What are the benefits/drawbacks of using a case insensitive collation in SQL Server (in terms of query performance)?
I have a database that is currently using a case-insensitive collation, and I don't really like it. I would very much like to change it to case sensitive. What should I be aware of when changing the collation?
A case-insensitive collation ignores the differences between uppercase and lowercase letters for string comparison and sorting, whereas a case-sensitive collation does not. For example, in case-insensitive collation, “A” and “a” are equal.
Database collation For example, the default server-level collation in SQL Server for the "English (United States)" machine locale is SQL_Latin1_General_CP1_CI_AS , which is a case-insensitive, accent-sensitive collation.
The SQL_Latin1_General_CP1_CI_AS collation is a SQL collation and the rules around sorting data for unicode and non-unicode data are different. The Latin1_General_CI_AS collation is a Windows collation and the rules around sorting unicode and non-unicode data are the same.
According the SQL Server Books Online, the characters included in range searches depend on the sorting rules of the collation. Collation Latin1_General_CS_AS uses dictionary order so both upper and lower case characters of the specified range are included.
If you change the collation on the database, you also have to change it on each column individually - they maintain the collation setting that was in force when their table was created.
create database CollTest COLLATE Latin1_General_CI_AI
go
use CollTest
go
create table T1 (
ID int not null,
Val1 varchar(50) not null
)
go
select name,collation_name from sys.columns where name='Val1'
go
alter database CollTest COLLATE Latin1_General_CS_AS
go
select name,collation_name from sys.columns where name='Val1'
go
Result:
name collation_name
---- --------------
Val1 Latin1_General_CI_AI
name collation_name
---- --------------
Val1 Latin1_General_CI_AI
I can't find anything to confirm whether properly constructed queries work faster on a case-sensitive vs case-insensitive database (although I suspect the difference is negligible), but a few things are clear to me:
A query like:
... WHERE UPPER(GivenName) = 'PETER'
won't use an index on GivenName. You would think something like:
... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
would work better, and it does. But for maximum performance you'd have to do something like:
... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
AND GivenName LIKE 'PETER'
(see this article for the details)
I would say the biggest drawback to changing to a case sensitive collation in a production database would be that many, if not most, of your queries would fail because they are currently designed to ignore case.
I've not tried to change collation on an existing datbase, but I suspect it could be quite time consuming to do as well. You probably will have to lock your users out completely while the process happens too. Do not try this unless you have thoroughly tested on dev.
If you change the database collation but not the server collation (and they then don't match as a result), watch out when using temporary tables. Unless otherwise specified in their CREATE statement, they will use the server's default collation rather than that of the database which may cause JOINs or other comparisons against your DB's columns (assuming they're also changed to the DB's collation, as alluded to by Damien_The_Unbeliever) to fail.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With