Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.Net multipart/form-data form enctype and UTF-8 "special" characters => � (MVC w/ HttpPostedFileBase)

Goal:

Upload / post CSV file w/ UTF-8 characters to an MVC action, read the the data and stick it in a database table.

Problem:

Only the plain text characters make it through. UTF-8 "special" characters like á are not coming through correctly, in code and in the database they render as this character => �.

More:

I'm convinced that this isn't a problem with my C# code although I've included the important parts below.

I thought the problem was that the uploaded file is encoded a plain text or "plain/text" MIME type, but I was able to change that by changing the file extension to .html

Summary:

How do you get a form with an enctype attribute set to "multipart/form-data" to correctly interpret UTF-8 characters in a posted file?

Research:

From my research this appears to be a common problem without a common and clear solution.

I've found more solutions for java and PHP than .Net as well.


  • csvFile variable is of type HttpPostedFileBase

  • this is the MVC action signature

[HttpPost]

public ActionResult LoadFromCsv(HttpPostedFileBase csvFile)


Things I've tried:

1)

using (Stream inputStream = csvFile.InputStream)
{
    byte[] bytes = ReadFully(inputStream);
    string bytesConverted = new UTF8Encoding().GetString(bytes);
}

2)

using (Stream inputStream = csvFile.InputStream)
{
    using (StreamReader readStream = new StreamReader(inputStream, Encoding.UTF8, true))
    {
        while (!readStream.EndOfStream)
        {
            string csvLine = readStream.ReadLine();
            // string csvLine = new UTF8Encoding().GetString(new UTF8Encoding().GetBytes(readStream.ReadLine())); // stupid... this can not be the way!
        }
    }
}

3)

<form method="post" enctype="multipart/form-data" accept-charset="UTF-8">

4)

<input type="file" id="csvFile" name="csvFile" accept="UTF-8" />

<input type="file" id="csvFile" name="csvFile" accept="text/html" />

5)

When the file has a .txt extension, the ContentType property of the HttpPostedFileBase is "text/plain"

When I change the file extension from .txt to .csv the ContentType property of the HttpPostedFileBase is "application/vnd.ms-excel"

When I change the file extension to .html, the ContentType property of the HttpPostedFileBase is "text/html" - I thought this was going to be a winner, but it wasn't.


In my soul I have to believe there is an easy solution to this problem. It surprises me that I haven't been able to figure this one out on my own, uploading UTF-8 characters in a file is a common task! Why am I failing here?!?!

Perhaps I have to adjust mime types in IIS for the website?

Perhaps I need different DOCTYPE / html tag / meta tags?


@Gabe -

Here is what my post looks like in fiddler. This is really interesting because the � is plain as day, right there in the post value.

http://localhost/AwesomeGeography/GeoBytesCities/LoadFromCsv?adsf HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost/AwesomeGeography/GeoBytesCities/LoadFromCsv?adsf
Content-Type: multipart/form-data; boundary=---------------------------199122566726299
Content-Length: 354

-----------------------------199122566726299
Content-Disposition: form-data; name="csvFile"; filename="cities_test.html"
Content-Type: text/html

"CityId","CountryID","RegionID","City","Latitude","Longitude","TimeZone","DmaId","Code"
3344,10,1063,"Luj�n de Cuyo","-33.05","-68.867","-03:00",0,"LDCU"
-----------------------------199122566726299--
like image 854
Dudeman3000 Avatar asked Jun 03 '12 16:06

Dudeman3000


2 Answers

I have the same problem, you can use

StreamReader reader = new StreamReader(archivo_origen.InputStream, Encoding.GetEncoding("iso-8859-1"));

and it work, "iso-8859-1" is for latin derived language like, spanish, aleman, frances

like image 77
Diego_DX Avatar answered Nov 12 '22 18:11

Diego_DX


Based on the information given, I would guess that the problem is with the file encoding itself - not with your code.

I ran a simple test to demonstrate this:

  1. I exported a simple csv file from Excel containing special characters.

  2. Then, I uploaded it through the following form and action method.

Form

<form method="post" action="@Url.Action("UploadFile", "Home")" enctype="multipart/form-data">
    <input type="file" id="file" name="file" />
    <input type="submit" />
</form>

Action method

[HttpPost]
public ActionResult UploadFile(HttpPostedFileBase file)
{
    using (StreamReader reader = new StreamReader(file.InputStream, System.Text.Encoding.UTF8))
    {
        string text = reader.ReadToEnd();
    }

    return RedirectToAction("Index");
}

I had the same problem as you in this case - the special characters were replaced with �.

I opened the file in Notepad and the special characters were displayed correctly there, so it seemed that it couldn't be a file problem, but when I opened the "Save As" dialog, the selected encoding was "ANSI". I switched it to UTF-8 and saved it, ran it through the uploader, and it all worked fine.

like image 31
Spectre87 Avatar answered Nov 12 '22 18:11

Spectre87