I've been working with json for some time and the issue is the strings I decode are encoded as Latin-1 and I cannot get it to work as UTF-8. Because of that, some characters are shown incorrectly (ex. ' shown as ').
I've read a few questions here on stackoverflow, but they doesn't seem to work.
The json structure I'm working with look like this (it is from YouTube API):
...
"items": [
{
...
"snippet": {
...
"title": "Powerbeats Pro “Totally Wireless” Except when you need a wire",
...
}
}
]
I encode it with:
response = await http.get(link, headers: {HttpHeaders.contentTypeHeader: "application/json; charset=utf-8"});
extractedData = json.decode(response.body);
dataTech = extractedData["items"];
And then what I tried was changing the second line to:
extractedData = json.decode(utf8.decode(response.body));
But this gave me an error about wrong format. So I changed it to:
extractedData = json.decode(utf8.decode(response.bodyBytes));
And this doesn't throw the error, but neither does it fix the problem. Playing around with headers does neither.
I would like the data to be stored in dataTech as they are now, but encoded as UTF-8. What am I doing wrong?
The default encoding is UTF-8. (in §6) JSON may be represented using UTF-8, UTF-16, or UTF-32. When JSON is written in UTF-8, JSON is 8bit compatible. When JSON is written in UTF-16 or UTF-32, the binary content-transfer-encoding must be used.
The answer is yes: JSON.
Source code: Lib/json/__init__.py. JSON (JavaScript Object Notation), specified by RFC 7159 (which obsoletes RFC 4627) and by ECMA-404, is a lightweight data interchange format inspired by JavaScript object literal syntax (although it is not a strict subset of JavaScript 1 ).
Just an aside first: UTF-8 is typically an external format, and typically represented by an array of bytes. It's what you might send over the network as part of an HTTP response. Internally, Dart stores strings as UTF-16 code points. The utf8
encoder/decoder converts between internal format strings and external format arrays of bytes.
This is why you are using utf8.decode(response.bodyBytes)
; taking the raw body bytes and converting them to an internal string. (response.body
basically does this too, but it chooses the bytes->string decoder based on the response header charset. When this charset header is missing (as it often is) the http
package picks Latin-1, which obviously doesn't work if you know that the response is in a different charset.) By using utf8.decode
yourself, you are overriding the (potentially wrong) choice being made by http
because you know that this particular server always sends UTF-8. (It may not, of course!)
Another aside: setting a content type header on a request is rarely useful. You typically aren't sending any content - so it doesn't have a type! And that doesn't influence the content type or content type charset that the server will send back to you. The accept
header might be what you are looking for. That's a hint to the server of what type of content you'd like back - but not all servers respect it.
So why are your special characters still incorrect? Try printing utf8.decode(response.bodyBytes)
before decoding it. Does it look right in the console? (It very useful to create a simple Dart command line application for this type of issue; I find it easier to set breakpoints and inspect variables in a simple ten line Dart app.) Try using something like Wireshark to capture the bytes on the wire (again, useful to have the simple Dart app for this). Or try using Postman to send the same request and inspect the response.
How are you trying to show the characters. If may simply be that the font you are using doesn't have them.
just add the header : 'Accept': 'application/json; charset=UTF-8';
it worked for me
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With