Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert UTF-8 string to ISO-8859-1

My Classic ASP application retrieves an UTF-8 string from it's database, but I need to convert it to ISO-8859-1. I can't change the HTML page encoding;

I really need to convert just the fetched string. How can I do it?

like image 795
Metalcoder Avatar asked Mar 03 '15 14:03

Metalcoder


2 Answers

I found the answer here:

Const adTypeBinary = 1
Const adTypeText = 2

' accept a string and convert it to Bytes array in the selected Charset
Function StringToBytes(Str,Charset)
  Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
  Stream.Type = adTypeText
  Stream.Charset = Charset
  Stream.Open
  Stream.WriteText Str
  Stream.Flush
  Stream.Position = 0
  ' rewind stream and read Bytes
  Stream.Type = adTypeBinary
  StringToBytes= Stream.Read
  Stream.Close
  Set Stream = Nothing
End Function

' accept Bytes array and convert it to a string using the selected charset
Function BytesToString(Bytes, Charset)
  Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
  Stream.Charset = Charset
  Stream.Type = adTypeBinary
  Stream.Open
  Stream.Write Bytes
  Stream.Flush
  Stream.Position = 0
  ' rewind stream and read text
  Stream.Type = adTypeText
  BytesToString= Stream.ReadText
  Stream.Close
  Set Stream = Nothing
End Function

' This will alter charset of a string from 1-byte charset(as windows-1252)
' to another 1-byte charset(as windows-1251)
Function AlterCharset(Str, FromCharset, ToCharset)
  Dim Bytes
  Bytes = StringToBytes(Str, FromCharset)
  AlterCharset = BytesToString(Bytes, ToCharset)
End Function

So I just did this:

AlterCharset(str, "ISO-8859-1", "UTF-8")

And it worked nicely.

like image 88
Metalcoder Avatar answered Oct 27 '22 01:10

Metalcoder


To expand on the OP's own self-answer, when converting from single-byte character sets (such as ISO-8859-1, Windows-1251, Windows-1252, etc...) to UTF-8, there is some needless redundancy in converting to and back from ADODB's byte array. The overhead of multiple function calls and conversions can be eliminated as such:

Const adTypeText = 2

Private Function AsciiStringToUTF8(AsciiString)
    Dim objStream: Set objStream = CreateObject("ADODB.Stream")
    Call objStream.Open()
    objStream.Type = adTypeText
    'Any single-byte charset should work in theory
    objStream.Charset = "Windows-1252"
    Call objStream.WriteText(AsciiString)
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    objStream.Position = 0
    objStream.Charset = "UTF-8"
    AsciiStringToUTF8 = objStream.ReadText()
    Call objStream.Close(): Set objStream = Nothing
End Function
like image 42
Makaveli84 Avatar answered Oct 27 '22 01:10

Makaveli84