I am parsing a websocket message and due do a bug in a specific socket.io version (Unfortunately I don't have control over the server side), some of the payload is double encoded as utf-8: The correct value would be Wrocławskiej (note the l letter which is LATIN SMALL LETTER L WITH STROKE) but I actually get back WrocÅawskiej. I already tried to decode/encode it again with java <pre class="prettyprint"><code>String str = new String(wrongEncoded.getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8); </code></pre> Unfortunately the string stays the same. Any idea on how to do a double decoding in java? I saw a python version where they convert it to <code>raw_unicode</code> first and then parse it again, but I don't know this works or if there is a similar solution for Java. I already read through a couple of posts on that topic, but none helped. Edit: To clarify in Fiddler I receive the following byte sequence for the above mentionend word: <pre class="prettyprint"><code>WrocÃÂawskiej byte[] arrOutput = { 0x57, 0x72, 0x6F, 0x63, 0xC3, 0x85, 0xC2, 0x82, 0x61, 0x77, 0x73, 0x6B, 0x69, 0x65, 0x6A }; </code></pre>

You text was encoding to UTF-8, those bytes were then interpreted as ISO-8859-1 and re-encoded to UTF-8. <code>Wrocławskiej</code> is unicode: 0057 0072 006f 0063 0142 0061 0077 0073 006b 0069 0065 006a Encoding to UTF-8 it is: 57 72 6f 63 c5 82 61 77 73 6b 69 65 6a In ISO-8859-1, <code>c5</code> is <code>Å</code> and <code>82</code> is undefined. As ISO-8859-1, those bytes are: <code>WrocÅawskiej</code> Encoding to UTF-8 it is: 57 72 6f 63 c3 85 c2 82 61 77 73 6b 69 65 6a Those are likely the bytes you are receiving. So, to undo that, you need: <pre class="prettyprint"><code>String s = new String(bytes, StandardCharsets.UTF_8); // fix "double encoding" s = new String(s.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8); </code></pre>

Java Decode double encoded utf-8 char

Tags:

java

encoding

utf-8

I am parsing a websocket message and due do a bug in a specific socket.io version (Unfortunately I don't have control over the server side), some of the payload is double encoded as utf-8:

The correct value would be Wrocławskiej (note the l letter which is LATIN SMALL LETTER L WITH STROKE) but I actually get back WrocÅawskiej.

I already tried to decode/encode it again with java

String str = new String(wrongEncoded.getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8);

Unfortunately the string stays the same. Any idea on how to do a double decoding in java? I saw a python version where they convert it to raw_unicode first and then parse it again, but I don't know this works or if there is a similar solution for Java. I already read through a couple of posts on that topic, but none helped.

Edit: To clarify in Fiddler I receive the following byte sequence for the above mentionend word:

WrocÃÂawskiej

byte[] arrOutput = { 0x57, 0x72, 0x6F, 0x63, 0xC3, 0x85, 0xC2, 0x82, 0x61, 0x77, 0x73, 0x6B, 0x69, 0x65, 0x6A };

662

asked Jun 29 '17 16:06

Christoph S

1 Answers

You text was encoding to UTF-8, those bytes were then interpreted as ISO-8859-1 and re-encoded to UTF-8.

Wrocławskiej is unicode: 0057 0072 006f 0063 0142 0061 0077 0073 006b 0069 0065 006a
Encoding to UTF-8 it is: 57 72 6f 63 c5 82 61 77 73 6b 69 65 6a

In ISO-8859-1, c5 is Å and 82 is undefined.
As ISO-8859-1, those bytes are: WrocÅawskiej
Encoding to UTF-8 it is: 57 72 6f 63 c3 85 c2 82 61 77 73 6b 69 65 6a
Those are likely the bytes you are receiving.

So, to undo that, you need:

String s = new String(bytes, StandardCharsets.UTF_8);

// fix "double encoding"
s = new String(s.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);

153

answered Oct 22 '22 07:10

Andreas

Related questions
                            
                                Setting system properties when using junitPlatform
                            
                                Unexpected bound
                            
                                EnableWebMvc throws ServletException: Could not resolve view with name
                            
                                AES Encrypt using CryptoJS
                            
                                What's the calling convention for the Java code in Linux platform?
                            
                                What is a valid UUID?
                            
                                What is the Difference between ITEXT and ITEXTPDF? [closed]
                            
                                Spring data query by Example
                            
                                JMH: Returning the benchmark results as a json object
                            
                                Validator that must accept only specific numeric values
                            
                                Disabling WebView links works on emulator but no on device
                            
                                Which executor is used when composing Java CompletableFutures?
                            
                                How to continue processing after an error happens in RxJava 2?
                            
                                java polymorphism aliasing issue
                            
                                How to query LocalDateTime with LocalDate?
                            
                                NoSuchElementException thrown while Testing Maven Plugin
                            
                                How does Kotlin interoperate to Java and JavaScript?
                            
                                JSR - 349 bean validation for Spring @RestController with Spring Boot
                            
                                Find which element of the stream does not match the given predicate in allmatch
                            
                                Java Inheritance: Calling a subclass method in a superclass

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With