Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server not difference between 'ی' and 'ي' in Arabic_CI_AS collation

I'm using ASCII function for getting equivalent ASCII code of two characters, but I'm surprised when seeing there is no difference between 'ي' and 'ی', can anyone help me?

SELECT ASCII('ي'), ASCII('ی')
like image 912
Hossein POURAKBAR Avatar asked Mar 10 '16 11:03

Hossein POURAKBAR


People also ask

Which is the best collation for SQL Server?

However here are the settings we typically recommend: Set the default collation to SQL_Latin1_General_CP1_CI_AS. Ensure that you have SQL Server running in Case Insensitive mode. We use NCHAR, NVARCHAR string types so all data is unicode, so no character set is specified.

What is collate latin1_general_ci_as in SQL Server?

The collate clause is used for case sensitive and case insensitive searches in the columns of the SQL server. There are two types of collate clause present: SQL_Latin1_General_CP1_CS_AS for case sensitive. SQL_Latin1_General_CP1_CI_AS for case insensitive.

What is SQL Server collation settings?

A collation specifies the bit patterns that represent each character in a dataset. Collations also determine the rules that sort and compare data. SQL Server supports storing objects that have different collations in a single database.


4 Answers

Because your character is non Unicode you have to use UNICODE() function instead of ASCII() .

SELECT ASCII('ي'), ASCII('ی')

will result: 237, 237

but

SELECT UNICODE(N'ي'), UNICODE(N'ی')

will result: 1610, 1740

like image 83
Behzad Avatar answered Oct 02 '22 19:10

Behzad


Try this

SELECT UNICODE(N'ي'), UNICODE(N'ی')
like image 30
DimaSUN Avatar answered Sep 29 '22 19:09

DimaSUN


Another solution by using the proper collate in case you want to use Ascii

Arabic_CS_AS_KS

result will come as ى = 236 and ي= 237

like image 36
Monah Avatar answered Oct 02 '22 19:10

Monah


This is a limitation ASCII function. According to the documentation, ASCII:

Returns the ASCII code value of the leftmost character of a character expression.

However, the characters in your question are made up of more than one byte. It appears that ASCII can only read one byte.

When you use these characters as string literals without the N prefix, they are treated as single-byte characters. The following query shows that SQL Server does not treat these characters as equal in the Arabic_CI_AS collation when they are properly marked as multi-byte:

SELECT CASE WHEN 'ي' COLLATE Arabic_CI_AS <> 'ی' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_ascii,
CASE WHEN N'ي' COLLATE Arabic_CI_AS <> N'ی' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_unicode

The following query shows the bytes that make up the characters:

SELECT CAST(N'ي' COLLATE Arabic_CI_AS as varbinary(4)),
CAST(N'ی' COLLATE Arabic_CI_AS as varbinary(4)),
CAST('ي' COLLATE Arabic_CI_AS as varbinary(4)),
CAST('ی' COLLATE Arabic_CI_AS as varbinary(4))

However, even when you mark the characters as unicode, the ASCII function returns the same value because it can only read one byte:

SELECT ASCII(N'ي' COLLATE Arabic_CI_AS) , ASCII(N'ی' COLLATE Arabic_CI_AS)

EDIT As TT. points out, these characters don't have an entry in the ASCII code table.

like image 32
Ed Harper Avatar answered Sep 30 '22 19:09

Ed Harper