I am having problems decoding UTF-8 strings in POST data when using the Node.JS web server.
See this complete testcase:
require("http").createServer(function(request, response) {
if (request.method != "POST") {
response.writeHead(200, {'Content-Type': 'text/html; charset=utf-8'});
response.end('<html>'+
'<head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head>'+
'<body>'+
'<form method="post">'+
'<input name="test" value="Grüße!"><input type="submit">'+
'</form></body></html>');
} else {
console.log("CONTENT TYPE=",request.headers['content-type']);
var body="";
request.on('data', function (data) {
body += data;
});
request.on('end', function () {
console.log("POST BODY=",body);
response.writeHead(200, {'Content-Type': 'text/plain; charset=utf-8'});
response.end("POST DATA:\n"+body+"\n---\nUNESCAPED:\n"+unescape(body)+
"\n---\nHARDCODED: Grüße!");
});
}
}).listen(11180);
This is a standalone web server that listens on port 11180 and sends a HTML page with a simple form that contains an input field with special characters. POSTing that form to the server will echo it's contents in a plain text response.
My problem is that the special charactes are not being displayed properly neither on the console nor in the browser. This is what I see with both FireFox and IE:
POST DATA:
test=Gr%C3%BC%C3%9Fe%21
---
UNESCAPED:
test=GrüÃe!
---
HARDCODED: Grüße!
The last line is a hardcoded string Grüße!
that should match the value of the input field (as to verify that it's not a displaying problem). Obviously the POST data is not interpreted as UTF-8. The same problem happens when using require('querystring')
to break the data into fields.
Any clue?
Using Node.JS v0.4.11 on Debian Linux 4, source code is saved in utf-8 charset
The üß UTF-8 characters are not found in the ascii charset, and are being represented by multiple ascii characters.
According to http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.
Switching your enctype on the form to multipart <form method="post" enctype="multipart/form-data />"
will correctly render the text as the UTF-8 characters. You then have to parse the multipart format. node-formidable seems to be the most popular lib for doing so.
It's probably much simpler to use decodeURIComponent()
as you mentioned in a comment. Unescape does not handle multibyte characters, and instead represents each byte as its own character, hence the garbling you're seeing. http://xkr.us/articles/javascript/encode-compare/
You can also use buffers to change the encoding. Overkill in this case, but if you needed to:
new Buffer(myString, 'ascii').toString('utf8');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With