How to get proper Java string from Python created string 'Oslobo\xc4\x91enja'? How to decode it? I've tryed I think everything, looked everywhere, I've been stuck for 2 days with this problem. Please help!
Here is the Python's web service method that returns JSON from which Java client with Google Gson parses it.
def list_of_suggestions(entry):
input = entry.encode('utf-8')
"""Returns list of suggestions from auto-complete search"""
json_result = { 'suggestions': [] }
resp = urllib2.urlopen('https://maps.googleapis.com/maps/api/place/autocomplete/json?input=' + urllib2.quote(input) + '&location=45.268605,19.852924&radius=3000&components=country:rs&sensor=false&key=blahblahblahblah')
# make json object from response
json_resp = json.loads(resp.read())
if json_resp['status'] == u'OK':
for pred in json_resp['predictions']:
if pred['description'].find('Novi Sad') != -1 or pred['description'].find(u'Нови Сад') != -1:
obj = {}
obj['name'] = pred['description'].encode('utf-8').encode('string-escape')
obj['reference'] = pred['reference'].encode('utf-8').encode('string-escape')
json_result['suggestions'].append(obj)
return str(json_result)
Here is solution on Java client
private String python2JavaStr(String pythonStr) throws UnsupportedEncodingException {
int charValue;
byte[] bytes = pythonStr.getBytes();
ByteBuffer decodedBytes = ByteBuffer.allocate(pythonStr.length());
for (int i = 0; i < bytes.length; i++) {
if (bytes[i] == '\\' && bytes[i + 1] == 'x') {
// \xc4 => c4 => 196
charValue = Integer.parseInt(pythonStr.substring(i + 2, i + 4), 16);
decodedBytes.put((byte) charValue);
i += 3;
} else
decodedBytes.put(bytes[i]);
}
return new String(decodedBytes.array(), "UTF-8");
}
String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors.
UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”
You are returning the string version of the python data structure.
Return an actual JSON response instead; leave the values as Unicode:
if json_resp['status'] == u'OK':
for pred in json_resp['predictions']:
desc = pred['description']
if u'Novi Sad' in desc or u'Нови Сад' in desc:
obj = {
'name': pred['description'],
'reference': pred['reference']
}
json_result['suggestions'].append(obj)
return json.dumps(json_result)
Now Java does not have to interpret Python escape codes, and can parse valid JSON instead.
Python escapes unicode characters by converting their UTF-8 bytes into a series of \xVV values, where VV is the hex value of the byte. This is very different from the java unicode escapes, which are just a single \uVVVV per character, where VVVV is hex UTF-16 encoding.
Consider:
\xc4\x91
In decimal, those hex values are:
196 145
then (in Java):
byte[] bytes = { (byte) 196, (byte) 145 };
System.out.println("result: " + new String(bytes, "UTF-8"));
prints:
result: đ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With