I have a problem where I am storing a UTF-8 string in SQL Server as UCS-2. When I pull it out to display on a page with content-type set to UTF-8 it works fine. But I have a third party Javascript component which when I pass it the string for the database it renders it as USC2. Or not UTF8.
Is there a way in ASP to convert this string to UTF-8 after reading it from the database to pass it to the third party component (obfuscated)?
Hope this makes sense.
My suspicion is you are falling foul of the classic form post character encoding mismatch problem.
It goes like this:-
If you examine the field contents directly with SQL server tools you will likely see the corrupted strings there. Now that you want to use this string with another component which is expecting a straight-forward unicode string this is where you discover this bug.
The solution is to always ensure all your pages not only send CharSet = "UTF-8" in the response but also use Response.CodePage = 65001 before using Response.Write and before attempting to read any Request.Form values. Use Codepage directive in the <%@ page header.
Now you are left with repairing the corrupt strings already in your DB.
Use an ADODB.Stream:-
Function ConvertFromUTF8(sIn)
Dim oIn: Set oIn = CreateObject("ADODB.Stream")
oIn.Open
oIn.CharSet = "WIndows-1252"
oIn.WriteText sIn
oIn.Position = 0
oIn.CharSet = "UTF-8"
ConvertFromUTF8 = oIn.ReadText
oIn.Close
End Function
This function (which BTW is the answer to your actual question) takes a corrupted string (one that has the byte of byte representation) and converts to the string it should have been. You need to apply this transform to every field in the DB that has fallen victim to the bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With