A recent whitehat scan made me aware of SQL Server's best fit unicode transformations. This means that when a string containing unicode characters is converted to a non-unicode string, SQL Server will do a best-fit replacement on the characters it can in order to not trash your data with question marks. For example: <pre class="prettyprint"><code>SELECT 'ŤĘŞŤ' </code></pre> Outputs "TEST" Each character is replaced with a "similar" ASCII equivalent. This can also be seen on a single character where unicode character 65308 (＜) is converted into ASCII character 60 (<). <pre class="prettyprint"><code>SELECT ascii(NCHAR(65308)) </code></pre> Outputs "60" The main question, is where the heck is this documented? I have Googled for all sorts of phrases and read Microsoft docs, but all I can find are people looking to do manual conversions and nothing that documents SQL Server's apparent automatic best fit unicode transformations. Furthermore, can this be turned off or configured? While the behavior is convenient for apps that do not store strings as unicode and probably goes completely noticed in most scenarios, penetration tests report this as a "high" vuln since unicode transformations can be used to circumvent validation routines and lead to vulns such as XSS.

(the following is an excerpt from my answer to the related question on DBA.StackExchange: Automatic Translation when Converting Unicode to non-Unicode / NVARCHAR to VARCHAR) These "best fit" mappings are documented, just not in the easiest of places to find. If you go to the following URL you will see a list of several files, each one named for the Code Page that it maps Unicode characters to: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/ Most of the files were last updated (or at least placed there) on 2006-10-04, and one of them was updated on 2012-03-14. The first part of those files maps ASCII codes into an equivalent Unicode Code Point. But the second part of each file maps the Unicode characters into their ASCII "equivalents". I wrote a test script that uses the Code Page 1252 mappings to check if SQL Server is truly using those mappings. That can be determined by answering these two questions: <ol> <li>For all mapped Code Points, does SQL Server convert them into the specified mappings ?</li> <li>For all unmapped Code Points, does SQL Server convert any of them into a non-"<code>?</code>" character?</li> </ol> The test script is too long to place here, so I posted it on Pastebin at: Unicode to Code Page mappings in SQL Server Running the script will show that the answer to the first question above is "Yes" (meaning that all of the provided mappings are adhered to). It will also show that the answer to the second question is "No" (meaning, none of the unmapped Code Points convert into anything but the character for "unknown"). Hence, that mapping file is very accurate :-). <blockquote> Furthermore, can this be turned off or configured? </blockquote> I do not believe so, but that doesn't mean it is impossible to do one or both. HOWEVER, it should be noted that these mappings are "Microsoft" mappings, and hence work with Windows and SQL Server; they are not SQL Server-specific. So, even if it is possible to find where this stuff is configured, it would probably be a bad idea to change since it would effect everything running on the OS.

Controlling SQL Servers best-fit unicode transformation

Q: How do you handle special characters in SQL?

How do you handle special characters in SQL query? Use braces to escape a string of characters or symbols. Everything within a set of braces in considered part of the escape sequence.

Q: How does SQL Server store Unicode data?

PostgreSQL – Storing Unicode Characters is Easy In SQL Server, to store non-English characters, we need to use NVARCHAR or NCAHR data type. In PostgreSQL, the varchar data type itself will store both English and non-English characters.

Tags:

sql-server

sql-server-2012

unicode

penetration-testing

A recent whitehat scan made me aware of SQL Server's best fit unicode transformations. This means that when a string containing unicode characters is converted to a non-unicode string, SQL Server will do a best-fit replacement on the characters it can in order to not trash your data with question marks. For example:

SELECT 'ŤĘŞŤ'

Outputs "TEST"

Each character is replaced with a "similar" ASCII equivalent. This can also be seen on a single character where unicode character 65308 (＜) is converted into ASCII character 60 (<).

SELECT ascii(NCHAR(65308))

Outputs "60"

The main question, is where the heck is this documented? I have Googled for all sorts of phrases and read Microsoft docs, but all I can find are people looking to do manual conversions and nothing that documents SQL Server's apparent automatic best fit unicode transformations. Furthermore, can this be turned off or configured?

While the behavior is convenient for apps that do not store strings as unicode and probably goes completely noticed in most scenarios, penetration tests report this as a "high" vuln since unicode transformations can be used to circumvent validation routines and lead to vulns such as XSS.

448

asked Sep 21 '15 22:09

Brad Wood

1 Answers

^{(the following is an excerpt from my answer to the related question on DBA.StackExchange: Automatic Translation when Converting Unicode to non-Unicode / NVARCHAR to VARCHAR)}

These "best fit" mappings are documented, just not in the easiest of places to find. If you go to the following URL you will see a list of several files, each one named for the Code Page that it maps Unicode characters to:

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/

Most of the files were last updated (or at least placed there) on 2006-10-04, and one of them was updated on 2012-03-14. The first part of those files maps ASCII codes into an equivalent Unicode Code Point. But the second part of each file maps the Unicode characters into their ASCII "equivalents".

I wrote a test script that uses the Code Page 1252 mappings to check if SQL Server is truly using those mappings. That can be determined by answering these two questions:

For all mapped Code Points, does SQL Server convert them into the specified mappings ?
For all unmapped Code Points, does SQL Server convert any of them into a non-"?" character?

The test script is too long to place here, so I posted it on Pastebin at:

Unicode to Code Page mappings in SQL Server

Running the script will show that the answer to the first question above is "Yes" (meaning that all of the provided mappings are adhered to). It will also show that the answer to the second question is "No" (meaning, none of the unmapped Code Points convert into anything but the character for "unknown"). Hence, that mapping file is very accurate :-).

Furthermore, can this be turned off or configured?

I do not believe so, but that doesn't mean it is impossible to do one or both. HOWEVER, it should be noted that these mappings are "Microsoft" mappings, and hence work with Windows and SQL Server; they are not SQL Server-specific. So, even if it is possible to find where this stuff is configured, it would probably be a bad idea to change since it would effect everything running on the OS.

143

answered Oct 23 '22 03:10

Solomon Rutzky

Related questions
                            
                                Using SMO Restore Class when there are multiple backup points in .bak file
                            
                                Export Data-tier application error
                            
                                SQL Server SELECT paging with JOIN
                            
                                How to avoid too many joins?
                            
                                Why is subquery and join so slow
                            
                                How to select multiple rows in one column with a given condition
                            
                                Why is the addition operator defined for DATETIME values but not for DATE?
                            
                                SQL Connection string Provider name
                            
                                Select Middle Rows in SQL Server
                            
                                SQL Select Statement Where
                            
                                SQL Server: how to perform a count over several Datetime ranges grouped by day/hour etc?
                            
                                Failed to open a connection to the database" While creating a Table Adapter in C#
                            
                                Add or Read Version Number to the SQL Stored-Procedures or Functions
                            
                                Can I treat a subquery with one row and one column as a scalar?
                            
                                How to use Full Text Search for any property with QueryOver API
                            
                                Grouping data on SQL Server
                            
                                In SQL Server, what is the difference between a user query and a system query?
                            
                                Why is VALUES(CONVERT(XML,'...')) much slower than VALUES(@xml)?
                            
                                How to turn an MSSQL database to SQLite database for Android
                            
                                Join tables by column names, convert string to column name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With