Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Questions about accent insensitivity in SQL Server (Latin1_General_CI_AS)

All our databases were installed using the default collation (Latin1_General_CI_AS).

We plan to change the collation to allow clients to search the database with accent insensitivity.

Questions:

  1. What are the negatives (if any) of having an accent insensitive database?

  2. Are there any performance overheads for an accent insensitive database?

  3. Why is the default for SQL Server collation accent sensitive; why would anyone want accent sensitive by default?

like image 614
Brett Postin Avatar asked Jan 25 '13 16:01

Brett Postin


People also ask

Which collation is best in SQL Server?

Binary is also the fastest sorting order. For more information, see the Binary collations section in this article. Sorts and compares data in SQL Server tables based on Unicode code points for Unicode data.

Why collation is used in SQL Server?

Collations in SQL Server provide sorting rules, case, and accent sensitivity properties to data. A collation defines bit patterns that represent each character in metadata of database. SQL Server supports storing objects that have different collations in database.

What does SQL_Latin1_General_CP1_CI_AS mean?

The collate clause is used for case sensitive and case insensitive searches in the columns of the SQL server. There are two types of collate clause present: SQL_Latin1_General_CP1_CS_AS for case sensitive. SQL_Latin1_General_CP1_CI_AS for case insensitive.

What is collate Database_default in SQL Server?

If you do not specify a collation, the column is assigned the default collation of the database. You can also use the database_default option in the COLLATE clause to specify that a column in a temporary table use the collation default of the current user database for the connection instead of tempdb.


1 Answers

Seriously, changing database collations is a royal pain. See this HOWTO from codeproject, and then think hard before you do it! This is the EASY way!

  • http://www.codeproject.com/Articles/302405/The-Easy-way-of-changing-Collation-of-all-Database

Firstly, you can permit searches of the database with accent insensitivity simply by specifying that as part of the search, you don't necessarily have to change the collation.

 select * from TableName
 where name collate Latin1_General_CI_AI like @parameter

Simple as. However, this will hurt the indexes.

An alternative is to supply a calculated field which you can index separately.

    create table TableName(
    ix int identity primary key,
    name nvarchar(20) collate latin1_general_ci_as
    )
    go
    alter table TableName
    add  name_AI as name collate latin1_general_CI_AI
    go
    create index IX_TableName_name_AI
    on dbo.TableName(name_AI)

The example above puts it in the table, but you could just as well create an indexed view.

    create view dbo.TableName_AI
    with schemabinding
    as 
    select ix,
    name collate Latin1_general_CI_AI as name
    from dbo.TableName
    go
    -- Need a unique clustered index first
    create unique clustered index IX_TableName_AI_Clustered on dbo.TableName_AI(ix)
    -- then the index for searching
    create index IX_TableName_AI_name on dbo.TableName_AI(name)

Then, for accent-insensitive searches, use the view TableName_AI.

To answer your specific questions:

  1. In an accent insensitive database, accent sensitive searches will be slower.

  2. Yes, but not so you would notice

  3. It just is. Something has to be the default: If you don't like it don't use the default!

    Think of it this way: "Hard" and "Herd" are not the same word. That one vowel difference is enough - even though they sound similar.

    An accent difference (a vs. á) is somewhere between a case difference (A vs. a), and a letter difference (a vs e). You have to draw the line somewhere.

    An accent affects the sound of the word and can make it have a different meaning, though I struggle to think of examples. I guess it makes more sense to someone who has words in their database in a language which makes use of accents.

like image 147
Ben Avatar answered Oct 13 '22 11:10

Ben