Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

character encoding in HTML file upload

I have a simple HTML form:

<form action="/file/import" method="POST" enctype="multipart/form-data">
  <input id="csvFile" type="file" name="file">
</form>

I have a problem with uploading CSV files saved with different charsets. Is it possible to make the browser convert any file to UTF-8 and send it in this way to the server?

I tried several options and tracked the data sent by browser with ngrep, after uploading a file originally saved in ISO-8859-2:

  1. Setting enctype to multipart/form-data; charset=utf-8 -> This for some reason resulted in browser sending Content-Type: application/x-www-form-urlencoded - like it failed to use my specified enctype.

  2. Adding to the form tag an attribute: accept-charset="UTF-8" -> No effect.

  3. Using <meta charset="UTF-8"> in <head> -> No effect.

I think that file upload should work like this. I don't want my server to care about different encodings, but rather receive data in standarized way. But if it is not possible, can I send information about the encoding from browser to the server somehow? I would appreciate any advice, thanks.

like image 595
astasiak Avatar asked Jan 14 '16 10:01

astasiak


Video Answer


1 Answers

You have to differentiate between the content (bytes) and the encoding (interpretation of the bytes). The html file upload functionality just transfers the bytes and does not care about the interpretation since it is not limited to text files but is also able to transfer binaries. Since your server receives the bytes it has to handle the intepretation.

enctype=multipart/form-data; charset=utf-8 just adds the header information about the charset so that your server can handle this. It does not trigger any converting before or after uploading the file.

like image 86
Hendric Avatar answered Oct 16 '22 16:10

Hendric