So I have an array of strings, and all of the strings are using the system default ANSI encoding and were pulled from a SQL database. So there are 256 different possible character byte values (single byte encoding).
Is there a way I can get json_encode()
to work and display these characters instead of having to use utf8_encode()
on all of my strings and ending up with stuff like \u0082
?
Or is that the standard for JSON?
JSON in HTTP are always encoded in UTF-8. Responses are parsed correctly when server writes content type header like application/json; charset=utf-8 . However, many servers (like Play framework itself) uses application/json without charset. In Play 2.6, that responses are parsed in ISO-8859-1 charset.
The json_encode() function is used to encode a value to JSON format.
These values (namely value1,value2, value3,...) can contain any special characters. JSON is an acronym for JavaScript Object Notation , so your asking if there is a JS way to encode/decode a JavaScript Object from and to a string? The answer is yes: JSON.
The JSON spec requires UTF-8 support by decoders. As a result, all JSON decoders can handle UTF-8 just as well as they can handle the numeric escape sequences. This is also the case for Javascript interpreters, which means JSONP will handle the UTF-8 encoded JSON as well.
Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?
If you have an ANSI encoded string, using utf8_encode()
is the wrong function to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like \u0082
from the json output, but technically these sequences are valid for json, you must not fear them.
json_encode
works with UTF-8
encoded strings only. If you need to create valid json
successfully from an ANSI
encoded string, you need to re-encode/convert it to UTF-8
first. Then json_encode
will just work as documented.
To convert an encoding from ANSI
(more correctly I assume you have a Windows-1252
encoded string, which is popular but wrongly referred to as ANSI
) to UTF-8
you can make use of the mb_convert_encoding()
function:
$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");
Another function in PHP that can convert the encoding / charset of a string is called iconv
based on libiconv. You can use it as well:
$str = iconv("CP1252", "UTF-8", $str);
utf8_encode()
does only work for Latin-1
, not for ANSI
. So you will destroy part of your characters inside that string when you run it through that function.
Related: What is ANSI format?
For a more fine-grained control of what json_encode()
returns, see the list of predifined constants (PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).
As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's always needed to first change the encoding before using json_encode
. That's just a standard array operation, for the simpler case of pdo::fetch()
a foreach
iteration:
while($row = $q->fetch(PDO::FETCH_ASSOC)) { foreach($row as &$value) { $value = mb_convert_encoding($value, "UTF-8", "Windows-1252"); } unset($value); # safety: remove reference $items[] = array_map('utf8_encode', $row ); }
The JSON standard ENFORCES Unicode encoding. From RFC4627:
3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8
Therefore, on the strictest sense, ANSI encoded JSON wouldn't be valid JSON; this is why PHP enforces unicode encoding when using json_encode()
.
As for "default ANSI", I'm pretty sure that your strings are encoded in Windows-1252. It is incorrectly referred to as ANSI.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With