Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multipart/form-data and UTF-8 in a ASP Classic application

I have a problem that I really don't understand. I'm trying to upload a files in a asp classic app, without the use of an external component. I also want to post some text that will be stored in a DB. The file upload perfectly, I'm using this code: Upload Files Without COM v3 by Lewis E. Moten III

The problem is the other form input fields. I'm using UTF-8, but they don't end up as UTF-8. I.e Swedish characters å ä and ö is displayed as question marks if I print them out using Response.Write.

I have saved the files in UTF-8 (with BOM), I have added the meta tag to tell the page it is in UTF-8. I have set Response.CharSet = "UTF-8".

The function to convert from binary to string looks like this (this is the only place I can think of that might be wrong, since the comments say that it pulls ANSI characters, but I think it should pull Unicode characters):

Private Function CStrU(ByRef pstrANSI)

    ' Converts an ANSI string to Unicode
    ' Best used for small strings

    Dim llngLength ' Length of ANSI string
    Dim llngIndex ' Current position

    ' determine length
    llngLength = LenB(pstrANSI)

    ' Loop through each character
    For llngIndex = 1 To llngLength

        ' Pull out ANSI character
        ' Get Ascii value of ANSI character
        ' Get Unicode Character from Ascii
        ' Append character to results
        CStrU = CStrU & Chr(AscB(MidB(pstrANSI, llngIndex, 1)))

    Next

End Function

I have created a test asp page (multiparttest.asp) to replicate this, the upload stuff from Lewis E. Moten is required to make it work (I have added his files in a subdir called upload).

<%Response.CharSet = "UTF-8" %>
<!--#INCLUDE FILE="upload/clsUpload.asp"-->
<html>
    <head>
        <title>Test</title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    </head>
    <body>
        <%
        Set objUpload = New clsUpload
        Response.Write( objUpload.Fields("testInput").Value )
        %>
        <form method="post" enctype="multipart/form-data" action="multiparttest.asp">
            <input type="text" name="testInput" />
            <input type="submit" value="submit" />
        </form>

    </body>
</html>

I have captured the request using LiveHTTP Headers in Firefox, and saved it as a UTF-8 file, the Swedish characters looks like they should (they didn't look ok in the LiveHTTP header GUI, but i'm guessing that the GUI it self doesn't use the correct encoding). This is how the POST request looks like:

http://localhost/testsite/multiparttest.asp

POST /testsite/multiparttest.asp HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/testsite/multiparttest.asp
Cookie: ASPSESSIONIDASBBRBTT=GLDJDBJALAMJFBFBDCCIONHF; ASPSESSIONIDAQABQBTT=DIPHILKAIICKJOIAIMILAMGE; ASPSESSIONIDCSABTCQS=KMHBLBLABKHCBGPNLMCIPPNJ
Content-Type: multipart/form-data; boundary=---------------------------7391102023625
Content-Length: 150
-----------------------------7391102023625
Content-Disposition: form-data; name="testInput"

åäö
-----------------------------7391102023625--

HTTP/1.x 200 OK
Cache-Control: private
Content-Length: 548
Content-Type: text/html; Charset=UTF-8
Server: Microsoft-IIS/7.0
X-Powered-By: ASP.NET
Date: Tue, 10 Nov 2009 14:20:17 GMT
----------------------------------------------------------

Any help in this mater is appreciated!

EDIT 10/11:

I've tried to add all these to the top of the asp file, due to different suggestions I've found on this problem elsewhere, with no different result..

<%@Language=VBScript codepage=65001 %>
<%Response.ContentType="text/html"%>
<%Response.Charset="UTF-8"%>
<%Session.CodePage=65001%>

EDIT 11/11:

This question seems related, UTF-8 text is garbled when form is posted as multipart/form-data. But they doesn't use ASP or IIS. Is it possible to setup some kind of character encoding for multipart/form-data in IIS? I'm using IIS7. Maybe my request does have the wrong encoding after all? (I'm really lost in the character encoding world right now)

like image 367
fredrik Avatar asked Nov 10 '09 15:11

fredrik


2 Answers

Your analysis of CStrU is correct. It assumes that single byte ANSI characters are being sent by the client. It also assumes that the codepage being used by both client and locale that the VBScript is running in are the same.

When using UTF-8 the assumptions made by CStrU will always be incorrect. There isn't, to my knowledge, a locale that has 65001 as its codepage (I think there are one or two that use 65000 but thats different again).

Here is a replacement function that assumes text is in UTF-8:-

 Private Function CStrU(ByRef pstrANSI)

  Dim llngLength '' # Length of ANSI string
  Dim llngIndex '' # Current position
  Dim bytVal
  Dim intChar

  '' # determine length
  llngLength = LenB(pstrANSI)

  '' # Loop through each character
  llngIndex = 1
  Do While llngIndex <= llngLength

   bytVal = AscB(MidB(pstrANSI, llngIndex, 1))
   llngIndex = llngIndex + 1

   If bytVal < &h80 Then
    intChar = bytVal
   ElseIf bytVal < &hE0 Then

    intChar = (bytVal And &h1F) * &h40

    bytVal =  AscB(MidB(pstrANSI, llngIndex, 1))
    llngIndex = llngIndex + 1

    intChar = intChar + (bytVal And &h3f)

   ElseIf bytVal < &hF0 Then

    intChar = (bytVal And &hF) * &h1000

    bytVal =  AscB(MidB(pstrANSI, llngIndex, 1))
    llngIndex = llngIndex + 1

    intChar = intChar + (bytVal And &h3F) * &h40

    bytVal =  AscB(MidB(pstrANSI, llngIndex, 1))
    llngIndex = llngIndex + 1

    intChar = intChar + (bytVal And &h3F)

   Else
    intChar = &hBF
   End If

   CStrU = CStrU & ChrW(intChar)
  Loop

 End Function

Note that with CStrU being corrected for UTF-8 the output of your example page now looks wrong. The advice to set the Codepage of the file to 65001 is also a requirement. Since you are setting the CharSet sent to the client to "UTF-8" you need to also tell ASP to use the UTF-8 code page when encoding text written using Response.Write.

like image 96
AnthonyWJones Avatar answered Sep 21 '22 01:09

AnthonyWJones


I don't know if this will be any help, but I have worked with some classic ASP code to use the SWFUpload utility (Flash plugin that allows multiple file uploads in a batch).

The ASP sample code includes some comprehensive code that sorts out the Byte/Unicode decoding, and looks similar to what you mention regarding chr(AscB(MidB(... - perhaps seeing a second example might shed light on your problem.

like image 42
Kristen Avatar answered Sep 20 '22 01:09

Kristen