I have a text input box, within a SPA built on AngularJS, for users to add a title to a printout. The input box is declared like this:
<input class="chart-title" type="text" ng-model="chartTitle" ng-change="titleChanged()"/>
The text box is filled with a default title provided by the server. A user may change the title to whatever suits them. When the title is changed, the server is updated and sends back a new title in the header of the response which then replaces the title in the box. This works perfectly for standard ASCII type characters.
However, for unicode characters (for example àßéçøö) it does not work. The text is sent down correctly, updated on the server correctly, and returned to the SPA correctly. The headers for the request/response are here:
Request URL:http://blahblahblah/api/.....&chartTitle=Instrument:%20%C3%A0%C3%9F%C3%A9%C3%A7%C3%B8%C3%B6
Response Headers:
chartTitle: Instrument: %C3%A0%C3%9F%C3%A9%C3%A7%C3%B8%C3%B6
The request is made using AngularJS $http()
. As you can see the values match up (the space in the request codes out as %20
for obvious reasons). However, when I retrieve the header, using headers("charttitle")
, the value I receive is Instrument: à Ãéçøö
The javascript bundle is declared in the index with charset utf-8:
<script src="/js/bundle.js" type="text/javascript" charset="UTF-8"></script>
In addition the html is declared with the correct charset, it seems to me in two places within the head declaration:
<meta http-equiv="Content-Type" content="text/html charset=UTF-8" />
<meta charset="utf-8" />
According to this website (http://www.i18nqa.com/debug/utf8-debug.html) it appears that I am getting Windows1252 character encoding. This does not make any sense. I could, if absolutely necessary, write a horrible hack converting the utf-8 string to Windows1252 characters, but this seems a little extreme and quite error prone to me.
The effect is the same, whether on Chrome, Firefox or IE11. The full request headers are here:
Accept:application/json, text/plain, */*
Accept-Encoding:gzip, deflate, sdch, br
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Connection:keep-alive
Host:blahblahblah
Origin:http://blahblahblah
Referer:http://blahblahblah/
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
Is there anything I have left out? Anything that has been forgotten?
EDIT
Full response headers as requested.
Access-Control-Allow-Origin:*
Access-Control-Expose-Headers:chartTitle
Cache-Control:private
chartTitle:Instrument: %C3%A0%C3%9F%C3%A9%C3%A7%C3%B8%C3%B6
Content-Disposition:attachment; filename=PrintData.pdf
Content-Length:1391643
Content-Type:application/octet-stream
Date:Fri, 20 Jan 2017 11:19:07 GMT
Server:Microsoft-IIS/10.0
X-AspNet-Version:4.0.30319
X-Powered-By:ASP.NET
X-SourceFiles:=?UTF-8?B?QzpcR2l0XEVPU1xSZXZpZXdlci5XZWJcYXBpXFByaW50XGQyOTNkNjA4NWVlYzlhNTEwYjQ5YThmZGQxNjNhMjAwMWZhYTFjMGY5YzhiMzUxYzE5ZjYxYWMwYTY1OWVhMDM=?=
Code around the headers
$http({
method: 'GET',
url: filePath,
params: {
fileName: fileName
},
responseType: 'arraybuffer',
headers: {'Content-Type' : 'application/json; charset=UTF-8'}
}).success(function (data, status, headers) {
ready();
if (status == 200) {
var chartTitle = headers("charttitle");
var printoutInformation = {'chartTitle' : chartTitle, 'pdfData' : data};
deferred.resolve(printoutInformation);
}
else {
deferred.resolve(null);
}
}).error(function (data) {
ready();
console.log(data);
});
return deferred.promise;
EDIT
The web.config
for the api also specifies utf-8:
<globalization requestEncoding="utf-8" responseEncoding="utf-8"/>
TL;DR
In a text box I want to display "Instrument àßéçøö" and instead I am seeing "Instrument: à Ãéçøö"
Yes. 0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units.
UTF-8 is a way of encoding Unicode so that an ASCII text file encodes to itself. No wasted space, beyond the initial bit of every byte ASCII doesn't use. And if your file is mostly ASCII text with a few non-ASCII characters sprinkled in, the non-ASCII characters just make your file a little longer.
Encoding in Node is extremely confusing, and difficult to get right. It helps, though, when you realize that Javascript string types will always be encoded as UTF-16, and most of the other places strings in RAM interact with sockets, files, or byte arrays, the string gets re-encoded as UTF-8.
UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”
Here is your issue solved.
Based on this source,
UTF-8 character debugging and its encoding and decoding
The response you are getting is the actual charecter of the encoded utf-8 string
So, you need to decode that inorder to get your result.
HEre is the code to do it.
decoded = decodeURIComponent('%C3%A0%C3%9F%C3%A9%C3%A7%C3%B8%C3%B6')
console.log(decoded);
The result is => "àßéçøö"
we have to do this to get the actual string instead of UTF-8
So, from your response you got,à Ãéçøö
decodeURIComponent(escape("à Ãéçøö")) => "àßéçøö"
DEFINITION:
So , here is your method.
if (status == 200) {
var original = headers("charttitle");
var chartTitle = decodeURIComponent(escape(original));
console.log(chartTitle);
var printoutInformation = {'chartTitle' : chartTitle, 'pdfData' : data};
deferred.resolve(printoutInformation);
}
Now, you will get the headers same as you send.
Try below for encoding
myAngApp1=document.getElementById("ItemSearch");
var uri = myAngApp1.value;
var place = encodeURIComponent(uri)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With