Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Controlling SQL Servers best-fit unicode transformation

A recent whitehat scan made me aware of SQL Server's best fit unicode transformations. This means that when a string containing unicode characters is converted to a non-unicode string, SQL Server will do a best-fit replacement on the characters it can in order to not trash your data with question marks. For example:

SELECT 'ŤĘŞŤ'

Outputs "TEST"

Each character is replaced with a "similar" ASCII equivalent. This can also be seen on a single character where unicode character 65308 (<) is converted into ASCII character 60 (<).

SELECT ascii(NCHAR(65308))

Outputs "60"

The main question, is where the heck is this documented? I have Googled for all sorts of phrases and read Microsoft docs, but all I can find are people looking to do manual conversions and nothing that documents SQL Server's apparent automatic best fit unicode transformations. Furthermore, can this be turned off or configured?

While the behavior is convenient for apps that do not store strings as unicode and probably goes completely noticed in most scenarios, penetration tests report this as a "high" vuln since unicode transformations can be used to circumvent validation routines and lead to vulns such as XSS.

like image 448
Brad Wood Avatar asked Sep 21 '15 22:09

Brad Wood


People also ask

How do you handle special characters in SQL?

How do you handle special characters in SQL query? Use braces to escape a string of characters or symbols. Everything within a set of braces in considered part of the escape sequence.

How does SQL Server store Unicode data?

PostgreSQL – Storing Unicode Characters is Easy In SQL Server, to store non-English characters, we need to use NVARCHAR or NCAHR data type. In PostgreSQL, the varchar data type itself will store both English and non-English characters.


1 Answers

(the following is an excerpt from my answer to the related question on DBA.StackExchange: Automatic Translation when Converting Unicode to non-Unicode / NVARCHAR to VARCHAR)

These "best fit" mappings are documented, just not in the easiest of places to find. If you go to the following URL you will see a list of several files, each one named for the Code Page that it maps Unicode characters to:

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/

Most of the files were last updated (or at least placed there) on 2006-10-04, and one of them was updated on 2012-03-14. The first part of those files maps ASCII codes into an equivalent Unicode Code Point. But the second part of each file maps the Unicode characters into their ASCII "equivalents".

I wrote a test script that uses the Code Page 1252 mappings to check if SQL Server is truly using those mappings. That can be determined by answering these two questions:

  1. For all mapped Code Points, does SQL Server convert them into the specified mappings ?
  2. For all unmapped Code Points, does SQL Server convert any of them into a non-"?" character?

The test script is too long to place here, so I posted it on Pastebin at:

Unicode to Code Page mappings in SQL Server

Running the script will show that the answer to the first question above is "Yes" (meaning that all of the provided mappings are adhered to). It will also show that the answer to the second question is "No" (meaning, none of the unmapped Code Points convert into anything but the character for "unknown"). Hence, that mapping file is very accurate :-).

Furthermore, can this be turned off or configured?

I do not believe so, but that doesn't mean it is impossible to do one or both. HOWEVER, it should be noted that these mappings are "Microsoft" mappings, and hence work with Windows and SQL Server; they are not SQL Server-specific. So, even if it is possible to find where this stuff is configured, it would probably be a bad idea to change since it would effect everything running on the OS.

like image 143
Solomon Rutzky Avatar answered Oct 23 '22 03:10

Solomon Rutzky