Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The length of a compressed Java String is not equal to the content-length when it is sent as a WebSocket message

I am trying to reduce bandwidth consumption by compressing the JSON String I am sending through the WebSocket from my Springboot application to the browser client (this is on top of permessage-deflate WebSocket extension). This scenario uses the following JSON String which has a length of 383 characters:

{"headers":{},"body":{"message":{"errors":{"password":"Password length must be at least 8 characters.","retype":"Retype Password cannot be null.","username":"Username length must be between 6 to 64 characters."},"links":[],"success":false,"target":{"password":"","retype":"","username":""}},"target":"/user/session/signup"},"statusCode":"UNPROCESSABLE_ENTITY","statusCodeValue":422}

To benchmark, I send both compressed and uncompressed String from the server like so:

Object response = …,

SimpMessageHeaderAccessor simpHeaderAccessor =
    SimpMessageHeaderAccessor.create(SimpMessageType.MESSAGE);
simpHeaderAccessor.setSessionId(sessionId);
simpHeaderAccessor.setContentType(new MimeType("application", "json",
    StandardCharsets.UTF_8));
simpHeaderAccessor.setLeaveMutable(true);
// Sends the uncompressed message.
messagingTemplate.convertAndSendToUser(sessionId, uri, response,
    simpHeaderAccessor.getMessageHeaders());

ObjectMapper mapper = new ObjectMapper();
String jsonString;

try {
    jsonString = mapper.writeValueAsString(response);
}
catch(JsonProcessingException e) {
    jsonString = response.toString();
}

log.info("The payload is application/json.");
log.info("uncompressed payload (" + jsonString.length() + " character):");
log.info(jsonString);

String lzStringCompressed = LZString.compress(jsonString);
simpHeaderAccessor = SimpMessageHeaderAccessor.create(SimpMessageType.MESSAGE);
simpHeaderAccessor.setSessionId(sessionId);
simpHeaderAccessor.setContentType(new MimeType("text", "plain",
    StandardCharsets.UTF_8));
simpHeaderAccessor.setLeaveMutable(true);
// Sends the compressed message.
messagingTemplate.convertAndSendToUser(sessionId, uri, lzStringCompressed,
    simpHeaderAccessor.getMessageHeaders());

log.info("The payload is text/plain.");
log.info("compressed payload (" + lzStringCompressed.length() + " character):");
log.info(lzStringCompressed);

Which logs the following lines in the Java console:

The payload is application/json.
uncompressed payload (383 character):
{"headers":{},"body":{"message":{"errors":{"password":"Password length must be at least 8 characters.","retype":"Retype Password cannot be null.","username":"Username length must be between 6 to 64 characters."},"links":[],"success":false,"target":{"password":"","retype":"","username":""}},"target":"/user/session/signup"},"statusCode":"UNPROCESSABLE_ENTITY","statusCodeValue":422}
The payload is text/plain.
compressed payload (157 character):
??????????¼??????????????p??!-??7??????????????????????????????????u??????????????????????·}???????????????????????????????????????/?┬R??b,??????m??????????

Then browser receives the two messages sent by the server and captured by this javascript:

stompClient.connect({}, function(frame) {
    stompClient.subscribe(stompClientUri, function(payload) {
        try {
            JSON.parse(payload.body);
            console.log("The payload is application/json.");
            console.log("uncompressed payload (" + payload.body.length + " character):");
            console.log(payload.body);

            payload = JSON.parse(payload.body);
        } catch (e) {
            try {
                payload = payload.body;
                console.log("The payload is text/plain.");
                console.log("compressed payload (" + payload.length + " character):");
                console.log(payload);

                var decompressPayload = LZString.decompress(payload);
                console.log("decompressed payload (" + decompressPayload.length + " character):");
                console.log(decompressPayload);

                payload = JSON.parse(decompressPayload);
            } catch (e) {
            } finally {
            }
        } finally {
        }
    });
});

Which displays the following lines in the browser's debug console:

The payload is application/json.
uncompressed payload (383 character):
{"headers":{},"body":{"message":{"errors":{"password":"Password length must be at least 8 characters.","retype":"Retype Password cannot be null.","username":"Username length must be between 6 to 64 characters."},"links":[],"success":false,"target":{"password":"","retype":"","username":""}},"target":"/user/session/sign-up"},"statusCode":"UNPROCESSABLE_ENTITY","statusCodeValue":422}
The payload is text/plain.
compressed payload (157 character):
ᯡࠥ䅬ࢀጨᎡ乀ஸ̘͢¬ߑ䁇啰˸⑱ᐣ䱁ሢ礒⽠݉ᐮ皆⩀p瑭漦!-䈠ᷕ7ᡑ刡⺨狤灣મ啃嵠ܸ䂃ᡈ硱䜄ቀρۯĮニᴴဠ䫯⻖֑点⇅劘畭ᣔ奢⅏㛥⡃Ⓛ撜u≂㥋╋ၲ⫋䋕᪒丨ಸ䀭䙇Ꮴ吠塬昶⬻㶶Т㚰ͻၰú}㙂᥸沁⠈ƹ⁄᧸㦓ⴼ䶨≋愐㢡ᱼ溜涤簲╋㺮橿䃍砡瑧ᮬ敇⼺ℙ滆䠢榵ⱀ盕ີ‣Ш眨રą籯/ሤÂR儰Ȩb,帰Ћ愰䀥․䰂m㛠ளǀ䀭❖⧼㪠Ө柀䀠 
decompressed payload (383 character):
{"headers":{},"body":{"message":{"errors":{"password":"Password length must be at least 8 characters.","retype":"Retype Password cannot be null.","username":"Username length must be between 6 to 64 characters."},"links":[],"success":false,"target":{"password":"","retype":"","username":""}},"target":"/user/session/sign-up"},"statusCode":"UNPROCESSABLE_ENTITY","statusCodeValue":422}

At this point I can now verify that whatever String value my Springboot application compresses, the browser can able to decompress and get the original String. There is a problem though. When I inspected the browser debugger if the size of the transferred message was actually reduced, it tells me that isn't.

Here is the raw uncompressed message (598B):

a["MESSAGE destination:/user/session/broadcast
content-type:application/json;charset=UTF-8
subscription:sub-0
message-id:5lrv4kl1-1
content-length:383

{"headers":{},"body":{"message":{"errors":{"password":"Password length must be at least 8 characters.","retype":"Retype Password cannot be null.","username":"Username length must be between 6 to 64 characters."},"links":[],"success":false,"target":{"password":"","retype":"","username":""}},"target":"/user/session/sign-up"},"statusCode":"UNPROCESSABLE_ENTITY","statusCodeValue":422}

While this is the raw compressed message (589B):

a["MESSAGE destination:/user/session/broadcast
content-type:text/plain;charset=UTF-8
subscription:sub-0
message-id:5lrv4kl1-2
content-length:425

á¯¡à ¥ä¬à¢á¨á¡ä¹à®¸Ì͢¬ßäå°Ë¸â±á£ä±á¢ç¤â½Ýá®çâ©pç­æ¼¦!-ä á·7á¡å¡âº¨ç¤ç£àª®ååµÜ¸äá¡ç¡±äáÏۯĮãá´´á䫯â»Öç¹âåç­á£å¥¢âã¥â¡âæuâã¥âá²â«äáªä¸¨à²¸ä­äá¤å塬æ¶â¬»ã¶¶Ð¢\u2029ã°Í»á°Ãº}ã᥸æ²âƹâ᧸ã¦â´¼ä¶¨âæ㢡ᱼæºæ¶¤ç°²â㺮橿äç¡ç§á®¬æ⼺âæ»ä¢æ¦µâ±çີâ£Ð¨ç¨àª°Ä籯/á¤ÃRå°È¨b,帰Ðæ°ä¥â¤ä°mãளÇä­â⧼㪠Өæä  \u0000"]

The debug console indicates that the uncompressed message was transferred with the size of 598B, with 383 character as the message payload's size (indicated by the content-length header). While on the other hand, the compressed message was transferred with a total size of 589B, 9B smaller than the uncompressed one, with 425 character as the message payload's size. I have several questions:

  1. Is the content-length of the STOMP message indicated in bytes, or in characters?
  2. Why does the content-length of the uncompressed message, which is 383, smaller than that of the compressed message, which is 425?
  3. Does this mean reducing the character length does not always necessarily means reducing the size?
  4. Why does the content-length of the compressed message, which is 425, not the same with the value returned in the Java console (using lzStringCompressed.length()) which is 157, considering that the uncompressed message was transferred with a content-length of 383, which is the same length in Java console. Both too are transferred with charset=UTF-8 encoding.
  5. Why does the content-length of the compressed message, which is 425, not the same with value returned in the Java console (using lzStringCompressed.length()) which is 157 but the JavaScript code payload.length returns 157, not 425?
  6. If it really gets bloated during the transfer, why does the message with application/json remained unaffected and only the plain/text gets bloated?

While the 9B difference is still a difference, I am reconsidering if the overhead cost for compressing/decompressing the message is worth to keep. I have to test other String values for that.

like image 933
Gideon Avatar asked Sep 18 '20 08:09

Gideon


People also ask

Why string length is not accurate in Java?

The normal model of Java string lengthYour description1 of the semantics of length based on the size of the backing array/array slice is incorrect. The fact that the value returned by length() is also the size of the backing array or array slice is merely an implementation detail of typical Java class libraries.

What is the correct way to find the length of a string in Java?

To calculate the length of a string in Java, you can use an inbuilt length() method of the Java string class. In Java, strings are objects created using the string class and the length() method is a public member method of this class.

How do you shrink a string in Java?

Deflator is one of most used class for string compression in java. It uses the popular ZLIB compression library. It provides the function called deflator() to compress string. The function first takes the input string data, performs the compression and then fills the given buffer with the compressed data.

What is the string length in Java?

Java String length() The java string length() method length of the string. It returns count of total number of characters. The length of java string is same as the unicode code units of the string. Internal implementation

What is the length of the compressed string if s = “AABBCCDD”?

If S = “AABBCCDD” then the string compressed form will be A2B2C2D2 therefore, the length of the compressed string will be 4. Explanation : It can be rewritten as a 2 ba therefore the answer will be 3.

What are the advantages of string compression in Java?

It can significantly reduce the size of files and speed up the file transfer rate saving significant time and resources. As Java is known for its practicality, there are some outstanding features available for string compression in java. How string compression and decompression is performed in java? 1. Deflater Class:

How to compress and decompress string in Java?

It can be easily performed using the StringBuilder class and some variables to keep track of string that has been checked. For example, see the code mentioned below, it has two separate functions for compression and decompression. 1. public class RLEInJava { 2. 3. public String compression (String comStr) { 4.


Video Answer


1 Answers

All the questions are close related.

  1. Is the content-length of the STOMP message indicated in bytes, or in characters?

As you can see in the STOMP specification:

All frames MAY include a content-length header. This header is an octet count for the length of the message body....

From a STOMP perspective the body is a byte array and the headers content-type and content-length determine what the body contains and how it should be interpreted.

  1. Why does the content-length of the uncompressed message, which is 383, smaller than that of the compressed message, which is 425?

Because of the conversion to UTF-8 which is carried out when you send the information to the client in your STOMP server.

You have a message, a String, and this message is composed of a series of characters.

Without going into great detail - please, review this or this other one excellent answers if you need further information - internally, every char in Java is represented in Unicode code units.

To represent these Unicode code units in a certain character set, UTF-8 in your case, a variable number of bytes may be required, from one to four in your specific case.

In the case of the uncompressed message, you have 383 chars, pure ASCII, which will be encoded to UTF-8 with one byte per char. This is why you obtain the same value in the content-length header.

But it is not the case of the compressed message: when you compress your message, it will give you an arbitrary number of bytes, corresponding to 157 chars - Unicode code units - with arbitrary information. The number of bytes obtained will be less than the original message. But then you encode it in UTF-8. Some of these 157 chars will be represented with one byte, as was the case with the original message, but due to the arbitrariness of the information of the compressed message it is more likely that, in many cases, two, three or four bytes are necessary to represent some of them. This is the cause why you obtain a number of bytes greater than the number of bytes for the uncompressed message.

  1. Does this mean reducing the character length does not always necessarily means reducing the size?

In general, you will always get a small size of information when you compress your data.

If the information is enough to make the use of compression worthwhile, and you have the ability to send the raw binary information compressed - similar to when a server sends information indicating Content-Encoding: gzip or deflate, it could bring you a great benefit.

But if the client library could only handle text messages and not binary ones, like SockJS for instance, as you can see the encoding problem may actually give you inappropriate results.

To mitigate the problem you can first try to compress your information to other intermediate encodings, like Base 64, which will give you roughly 1.6 times the number of bytes compressed: if this value is less than the number of bytes without compression, compressing the message may be worth it.

In any case, as indicated in the specification, STOMP is text based but also allows for the transmission of binary messages. Also, it indicates that the default encoding for STOMP is UTF-8, but it supports the specification of alternative encodings for message bodies.

If you are using, as your code suggests, stomp-js - please, be aware that I have not used this library, as the documentation indicates, it seems possible to process binary messages as well.

Basically, your server must send the raw bytes information with a content-type header with value application/octet-stream.

This information can be then processed in the client side by the library with something similar to this:

    // within message callback
    if (message.headers['content-type'] === 'application/octet-stream') {
      // message is binary
      // call message.binaryBody 
    } else {
      // message is text
      // call message.body
    }

If this works, and you can send the compressed information in this way, as indicated previously, the compression could bring you a great benefit.

  1. Why does the content-length of the compressed message, which is 425, not the same with the value returned in the Java console (using lzStringCompressed.length()) which is 157, considering that the uncompressed message was transferred with a content-length of 383, which is the same length in Java console. Both too are transferred with charset=UTF-8 encoding.

Consider the Javadoc of the length method of the String class:

Returns the length of this string. The length is equal to the number of Unicode code units in the string.

As you can see, the length method will give you the number of Unicode code units required to represent the String, meanwhile the content-length header will give you the number of bytes required to represent them in UTF-8 as indicated previously.

In fact, calculating the length of the string could be a tricky task.

  1. Why does the content-length of the compressed message, which is 425, not the same with value returned in the Java console (using lzStringCompressed.length()) which is 157 but the JavaScript code payload.length returns 157, not 425?

Because, as you can see in the documentation, length in Javascript also indicates the length of the String object in UTF-16 code units:

The length property of a String object contains the length of the string, in UTF-16 code units. length is a read-only data property of string instances.

  1. If it really gets bloated during the transfer, why does the message with application/json remained unaffected and only the text/plain gets bloated?

As above mentioned, it has nothing to do with the Content-Type but with the encoding of the information.

like image 121
jccampanero Avatar answered Oct 17 '22 07:10

jccampanero