Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 string not decoded correctly in AngularJS

I have a text input box, within a SPA built on AngularJS, for users to add a title to a printout. The input box is declared like this:

<input class="chart-title" type="text" ng-model="chartTitle" ng-change="titleChanged()"/>

The text box is filled with a default title provided by the server. A user may change the title to whatever suits them. When the title is changed, the server is updated and sends back a new title in the header of the response which then replaces the title in the box. This works perfectly for standard ASCII type characters.

However, for unicode characters (for example àßéçøö) it does not work. The text is sent down correctly, updated on the server correctly, and returned to the SPA correctly. The headers for the request/response are here:

Request URL:http://blahblahblah/api/.....&chartTitle=Instrument:%20%C3%A0%C3%9F%C3%A9%C3%A7%C3%B8%C3%B6

Response Headers:

chartTitle: Instrument: %C3%A0%C3%9F%C3%A9%C3%A7%C3%B8%C3%B6

The request is made using AngularJS $http(). As you can see the values match up (the space in the request codes out as %20 for obvious reasons). However, when I retrieve the header, using headers("charttitle"), the value I receive is Instrument: àÃéçøö

The javascript bundle is declared in the index with charset utf-8:

<script src="/js/bundle.js" type="text/javascript" charset="UTF-8"></script>

In addition the html is declared with the correct charset, it seems to me in two places within the head declaration:

<meta http-equiv="Content-Type" content="text/html charset=UTF-8" />
<meta charset="utf-8" />

According to this website (http://www.i18nqa.com/debug/utf8-debug.html) it appears that I am getting Windows1252 character encoding. This does not make any sense. I could, if absolutely necessary, write a horrible hack converting the utf-8 string to Windows1252 characters, but this seems a little extreme and quite error prone to me.

The effect is the same, whether on Chrome, Firefox or IE11. The full request headers are here:

Accept:application/json, text/plain, */*
Accept-Encoding:gzip, deflate, sdch, br
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Connection:keep-alive
Host:blahblahblah
Origin:http://blahblahblah
Referer:http://blahblahblah/
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36

Is there anything I have left out? Anything that has been forgotten?

EDIT

Full response headers as requested.

Access-Control-Allow-Origin:*
Access-Control-Expose-Headers:chartTitle
Cache-Control:private
chartTitle:Instrument: %C3%A0%C3%9F%C3%A9%C3%A7%C3%B8%C3%B6
Content-Disposition:attachment; filename=PrintData.pdf
Content-Length:1391643
Content-Type:application/octet-stream
Date:Fri, 20 Jan 2017 11:19:07 GMT
Server:Microsoft-IIS/10.0
X-AspNet-Version:4.0.30319
X-Powered-By:ASP.NET
X-SourceFiles:=?UTF-8?B?QzpcR2l0XEVPU1xSZXZpZXdlci5XZWJcYXBpXFByaW50XGQyOTNkNjA4NWVlYzlhNTEwYjQ5YThmZGQxNjNhMjAwMWZhYTFjMGY5YzhiMzUxYzE5ZjYxYWMwYTY1OWVhMDM=?=

Code around the headers

$http({
    method: 'GET',
    url: filePath,
    params: {
        fileName: fileName
    },
    responseType: 'arraybuffer',
    headers: {'Content-Type' : 'application/json; charset=UTF-8'}
}).success(function (data, status, headers) {
    ready();
    if (status == 200) {
        var chartTitle = headers("charttitle");
        var printoutInformation = {'chartTitle' : chartTitle, 'pdfData' : data};
        deferred.resolve(printoutInformation);
    }
    else {
        deferred.resolve(null);
    }
    }).error(function (data) {
        ready();
        console.log(data);
    });
    return deferred.promise;

EDIT

The web.config for the api also specifies utf-8:

    <globalization requestEncoding="utf-8" responseEncoding="utf-8"/>

TL;DR

In a text box I want to display "Instrument àßéçøö" and instead I am seeing "Instrument: à Ãéçøö"

like image 858
David Setty Avatar asked Jan 18 '17 11:01

David Setty


People also ask

What characters are not allowed in UTF-8?

Yes. 0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units.

What is UTF-8 and what problem does it solve?

UTF-8 is a way of encoding Unicode so that an ASCII text file encodes to itself. No wasted space, beyond the initial bit of every byte ASCII doesn't use. And if your file is mostly ASCII text with a few non-ASCII characters sprinkled in, the non-ASCII characters just make your file a little longer.

Are JavaScript strings UTF-8?

Encoding in Node is extremely confusing, and difficult to get right. It helps, though, when you realize that Javascript string types will always be encoded as UTF-16, and most of the other places strings in RAM interact with sockets, files, or byte arrays, the string gets re-encoded as UTF-8.

What are UTF-8 strings?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”


2 Answers

Here is your issue solved.

Based on this source,

UTF-8 character debugging and its encoding and decoding

The response you are getting is the actual charecter of the encoded utf-8 string

So, you need to decode that inorder to get your result.

HEre is the code to do it.

    decoded =  decodeURIComponent('%C3%A0%C3%9F%C3%A9%C3%A7%C3%B8%C3%B6')

    console.log(decoded);

   The result is => "àßéçøö"

we have to do this to get the actual string instead of UTF-8

So, from your response you got,à Ãéçøö

decodeURIComponent(escape("à Ãéçøö")) => "àßéçøö"

DEFINITION:

decodeURIComponent():

  • A new string representing the decoded version of the given encoded Uniform Resource Identifier (URI) component.

So , here is your method.

if (status == 200) {
    var original = headers("charttitle");
    var chartTitle = decodeURIComponent(escape(original));
    console.log(chartTitle);
    var printoutInformation = {'chartTitle' : chartTitle, 'pdfData' : data};
    deferred.resolve(printoutInformation);
}

Now, you will get the headers same as you send.

like image 81
Sravan Avatar answered Oct 26 '22 23:10

Sravan


Try below for encoding

myAngApp1=document.getElementById("ItemSearch"); var uri = myAngApp1.value; var place = encodeURIComponent(uri)

like image 42
Sanjeev Gautam Avatar answered Oct 26 '22 23:10

Sanjeev Gautam