Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Post UTF-8 encoded data to server loses certain characters

I am working on project which includes communication of the server (JavaEE app) and client (Android app). XML is sent as one of POST parameters of the HTTP request (named "xml"). There are also few other POST parameters which I pass to server, but in function below I removed them for simplicity. Problem that occurs is that certain letters are not properly delivered to the server - for example character Ű (Note that this is not German Ü, which is properly delivered, by the way). Code for sending is the following:

private String postSyncXML(String XML) {     String url = "http://10.0.2.2:8080/DebugServlet/DebugServlet";     HttpClient httpclient = new DefaultHttpClient();        List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>();     nameValuePairs.add(new BasicNameValuePair("xml",XML));      UrlEncodedFormEntity form;     try {         form = new UrlEncodedFormEntity(nameValuePairs);                 form.setContentEncoding(HTTP.UTF_8);         HttpPost httppost = new HttpPost(url);          httppost.setEntity(form);          HttpResponse response = (HttpResponse) httpclient .execute(httppost);         HttpEntity resEntity = response.getEntity();           String resp = EntityUtils.toString(resEntity);         Log.i(TAG,"postSyncXML srv response:"+resp);         return resp;     } catch (UnsupportedEncodingException e) {         e.printStackTrace();     } catch (ClientProtocolException e) {         e.printStackTrace();     } catch (IOException e) {         e.printStackTrace();     }     return null; } 

My guess is that problem is in the BasicNameValuePair I use to set XML as one of POST parameters, and it's documentation says it uses US-ASCII character set. What is the proper way to send UTF-8 encoded POST fields?

like image 460
dstefanox Avatar asked Mar 11 '11 08:03

dstefanox


People also ask

Can UTF-8 support all characters?

UTF-8 supports any unicode character, which pragmatically means any natural language (Coptic, Sinhala, Phonecian, Cherokee etc), as well as many non-spoken languages (Music notation, mathematical symbols, APL). The stated objective of the Unicode consortium is to encompass all communications.

What characters are not allowed in UTF-8?

0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units. A UTF-8 code unit is 8 bits. If by char you mean an 8-bit byte, then the invalid UTF-8 code units would be char values that do not appear in UTF-8 encoded text.

Is UTF-8 a fixed length?

The UTF-8 encoding of a character varies in length and it is always between one and four bytes. For fixed character-length UTF-8 data items, the number of bytes reserved for the data item in memory is 4 × n , where n is the number of characters specified in the definition of the item.

Is UTF-8 character set or encoding?

The Difference Between Unicode and UTF-8 Unicode is a character set. UTF-8 is encoding. Unicode is a list of characters with unique decimal numbers (code points). A = 65, B = 66, C = 67, ....


1 Answers

After much research and attempts to make things working, I finally found a solution for the problem, that is a simple addition to existing code. Solution was to use parameter "UTF-8" in the UrlEncodedFormEntity class constructor:

form = new UrlEncodedFormEntity(nameValuePairs,"UTF-8"); 

After this change, characters were encoded and delivered properly to the server side.

like image 80
dstefanox Avatar answered Oct 16 '22 10:10

dstefanox