Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jackson->Jackson + HttpPost = "Invalid UTF-8 middle byte", Setting Mime and Encoding

I'm using the Apache HTTP Client libs and Jackson in my client. When I post JSON to the server I get the error:

org.codehaus.jackson.JsonParseException: Invalid UTF-8 middle byte 0x65
 at [Source: HttpInputOverHTTP@22a4ac95; line: 1, column: 81]

If I don't set any headers than I get invalid media type, which makes sense.

If I use curl and the same headers, the server accepts it, so I think the server is OK (and just a coincidence that it's also using Jackson)

These is the document; I've hard coded it as a Java literal using only 8 bit characters to avoid any other place for mangling to happen

// "Stra\u00DFe" = "Straße"
static String TINY_UTF8_DOC = "[{ \"id\" : \"2\", \"fields\" : { \"subject\" : [{ \"name\" : \"subject\", \"value\" : \"Stra\u00DFe\" }] } }]";

Here's the code I've been using, and comments with the various attempts:

HttpClient httpClient = new DefaultHttpClient();
HttpPost post = new HttpPost( url );

// Attempt A
// post.setEntity(  new StringEntity( content )  );

// Attempt B
// post.setEntity(  new StringEntity( content )  );
// post.setHeader("Content-Type", "application/json; charset=utf-8");

// Attempt C
// post.setEntity(  new StringEntity( content, ContentType.create("application/json") )  );

// Attempt D
// post.setEntity(  new StringEntity( content, ContentType.create("application/json; charset=UTF-8") )  );

// Attempt F
// post.setEntity(  new StringEntity( content, ContentType.create("application/json; charset=utf-8") )  );

// Attempt G
// StringEntity params = new StringEntity( content );
// params.setContentType("application/json; charset=UTF-8");
// post.setEntity(params);

// And then send to server
HttpResponse response = httpClient.execute( post );
int code = response.getStatusLine().getStatusCode();
// ...etc...

Other weird things I've noticed:

  • For a while this behaved differently on Eclipse on the Mac vs. running a .jar on Linux; clearly that's a symptom of platform-specific encoding or decoding, but I don't know where. Ironically that broke when I set Eclipse to treat code as UTF-8 (vs. ASCII) I suspect this is an important clue, but not sure where it fits.
  • I've seen times when instead of 2 bytes there's 4 bytes in the stream, though this might have been a different encoding problem when writing to disk, though I was specifically setting UTF-8 on file IO
  • When I look at the string entity in the debugger, I see the bytes, but the 8-bit character is a negative number. When you run through the Two's Compliment math, it is still the correct Unicode code point, so nominally OK, assuming httpclient isn't buggy.

Really out of ideas, and as I said, it works with curl, so I think the server is OK.

Edit:

curl works when posting to the server, but I can't share the server code. It was pointed out that since curl isn't written in Java, and so perhaps it behaves differently, and therefore the server code could still be suspect.

So as a further test, the code below does NOT use the Apache httpclient library, and DOES work when posting to the server. This proves that the server is fine and there's still something wrong with how I'm using the Apache library on the client side (or maybe it's buggy).

Non-apache-httpclient code, which does work:

import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.URL;

class PostUtf8 {
    static String POST_URL = "http://...";

    // \u00DF = LATIN SMALL LETTER SHARP S, looks like letter B
    static String TINY_UTF8_DOC = "[{ \"id\" : \"2\", \"fields\" : { \"subject\" : [{ \"name\" : \"subject\", \"value\" : \"Stra\u00DFe\" }] } }]";

    public static void main( String [] args ) throws Exception {
        System.out.println( "Posting to " + POST_URL );
        URL url = new URL( POST_URL );
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        conn.setRequestProperty( "Content-Type", "application/json; charset=UTF-8" );
        conn.setRequestMethod("POST");
        conn.setDoOutput(true);
        OutputStream sout = conn.getOutputStream();
        OutputStreamWriter wout = new OutputStreamWriter(sout, "UTF-8" );
        wout.write( TINY_UTF8_DOC );
        wout.flush();
        int result = conn.getResponseCode();
        System.out.println( "Result = " + result );
    }
}
like image 348
Mark Bennett Avatar asked May 09 '14 21:05

Mark Bennett


1 Answers

It looks like the problem is how the ContentType parameter for the HttpClient's StringEntity constructor is being created.

Using the ContentType.APPLICATION_JSON constant as a parameter (which corresponds to the "application/json; charset=utf-8" mime type) makes everything work.

Here is an example posting the JSON string to a public http service that echoes the request back to the client:

public class HttpClientEncoding {

    static String TINY_UTF8_DOC = "[{ \"id\" : \"2\", \"fields\" : { \"subject\" : " +
            "[{ \"name\" : \"subject\", \"value\" : \"Stra\u00DFe\" }] } }]";

    public static void main(String[] args) throws IOException {
        HttpClient httpClient = new DefaultHttpClient();
        HttpPost post = new HttpPost("http://httpbin.org/post");
        StringEntity entity = new StringEntity(TINY_UTF8_DOC, ContentType.APPLICATION_JSON);
        //StringEntity entity = new StringEntity(TINY_UTF8_DOC, ContentType.create("application/json; charset=utf-8"));
        post.setEntity(entity);
        HttpResponse response = httpClient.execute(post);
        String result = EntityUtils.toString(response.getEntity());
        System.out.println(result);
        ObjectMapper mapper = new ObjectMapper();
        JsonNode node = mapper.readValue(result, JsonNode.class);
        System.out.println(node.get("json").get(0).get("fields").get("subject").get(0).get("value").asText());
    }
}

Output:

{
  "origin": "46.9.77.167",
  "url": "http://httpbin.org/post",
  "args": {},
  "data": "[{ \"id\" : \"2\", \"fields\" : { \"subject\" : [{ \"name\" : \"subject\", \"value\" : \"Stra\u00dfe\" }] } }]",
  "files": {},
  "form": {},
  "headers": {
    "Content-Length": "90",
    "User-Agent": "Apache-HttpClient/4.3.3 (java 1.5)",
    "Host": "httpbin.org",
    "Connection": "close",
    "X-Request-Id": "c02864cc-a1d6-434c-9cff-1f6187ceb080",
    "Content-Type": "application/json; charset=UTF-8"
  },
  "json": [
    {
      "id": "2",
      "fields": {
        "subject": [
          {
            "value": "Stra\u00dfe",
            "name": "subject"
          }
        ]
      }
    }
  ]
}
Straße
like image 166
Alexey Gavrilov Avatar answered Oct 26 '22 06:10

Alexey Gavrilov