If I give Closure Compiler something like this: <pre class="prettyprint"><code>window.array = '0123456789'.split(''); </code></pre> It "compiles" it to this: <pre class="prettyprint"><code>window.array="0,1,2,3,4,5,6,7,8,9".split(","); </code></pre> Now as you can tell, that's bigger. Is there any reason why Closure Compiler is doing this?

I think this is what's going on, but I am by no means certain... The code that causes the insertion of commas is <code>tryMinimizeStringArrayLiteral</code> in PeepholeSubstituteAlternateSyntax.java. That method contains a list of characters that are likely to have a low Huffman encoding, and are therefore preferable to split on than other characters. You can see the result of this if you try something like this: <pre class="prettyprint"><code>"a b c d e f g".split(" "); //Uncompiled, split on spaces "a,b,c,d,e,f,g".split(","); //Compiled, split on commas (same size) </code></pre> The compiler will replace the character you try to split on with one it thinks is favourable. It does so by iterating over the characters of the string and finding the most favourable splitting character that does not occur within the string: <pre class="prettyprint"><code>// These delimiters are chars that appears a lot in the program therefore // probably have a small Huffman encoding. NEXT_DELIMITER: for (char delimiter : new char[]{',', ' ', ';', '{', '}'}) { for (String cur : strings) { if (cur.indexOf(delimiter) != -1) { continue NEXT_DELIMITER; } } String template = Joiner.on(delimiter).join(strings); //... } </code></pre> In the above snippet you can see the array of characters the compiler claims to be optimal to split on. The comma is first (which is why in my space example above, the spaces have been replaced by commas). I believe the insertion of commas in the case where the string to split on is the empty string may simply be an oversight. There does not appear to be any special treatment of this case, so it's treated like any other <code>split</code> call and each character is joined with the first appropriate character from the array shown in the above snippet. <hr> Another example of how the compiler deals with the <code>split</code> method: <pre class="prettyprint"><code>"a,;b;c;d;e;f;g".split(";"); //Uncompiled, split on semi-colons "a, b c d e f g".split(" "); //Compiled, split on spaces </code></pre> This time, since the original string already contains a comma (and we don't want to split on the comma character), the comma can't be chosen from the array of low-Huffman-encoded characters, so the next best choice is selected (the space). <hr> Update Following some further research into this, it is definitely not a bug. This behaviour is actually by design, and in my opinion it's a very clever little optimisation, when you bear in mind that the Closure compiler tends to favour the speed of the compiled code over size. Above I mentioned Huffman encoding a couple of times. The Huffman coding algorithm, explained very simply, assigns a weight to each character appearing the the text to be encoded. The weight is based on the frequency with which each character appears. These frequencies are used to build a binary tree, with the most common character at the root. That means the most common characters are quicker to decode, since they are closer to the root of the tree. And since the Huffman algorithm is a large part of the DEFLATE algorithm used by gzip. So if your web server is configured to use gzip, your users will be benefiting from this clever optimisation.

Why does Closure Compiler insist on adding more bytes?

Tags:

If I give Closure Compiler something like this:

window.array = '0123456789'.split('');

It "compiles" it to this:

window.array="0,1,2,3,4,5,6,7,8,9".split(",");

Now as you can tell, that's bigger. Is there any reason why Closure Compiler is doing this?

643

asked Apr 18 '12 13:04

qwertymk

2 Answers

I think this is what's going on, but I am by no means certain...

The code that causes the insertion of commas is tryMinimizeStringArrayLiteral in PeepholeSubstituteAlternateSyntax.java.

That method contains a list of characters that are likely to have a low Huffman encoding, and are therefore preferable to split on than other characters. You can see the result of this if you try something like this:

"a b c d e f g".split(" "); //Uncompiled, split on spaces
"a,b,c,d,e,f,g".split(","); //Compiled, split on commas (same size)

The compiler will replace the character you try to split on with one it thinks is favourable. It does so by iterating over the characters of the string and finding the most favourable splitting character that does not occur within the string:

// These delimiters are chars that appears a lot in the program therefore
// probably have a small Huffman encoding.
NEXT_DELIMITER: for (char delimiter : new char[]{',', ' ', ';', '{', '}'}) {
  for (String cur : strings) {
    if (cur.indexOf(delimiter) != -1) {
      continue NEXT_DELIMITER;
    }
  }
  String template = Joiner.on(delimiter).join(strings);
  //...
}

In the above snippet you can see the array of characters the compiler claims to be optimal to split on. The comma is first (which is why in my space example above, the spaces have been replaced by commas).

I believe the insertion of commas in the case where the string to split on is the empty string may simply be an oversight. There does not appear to be any special treatment of this case, so it's treated like any other split call and each character is joined with the first appropriate character from the array shown in the above snippet.

Another example of how the compiler deals with the split method:

"a,;b;c;d;e;f;g".split(";"); //Uncompiled, split on semi-colons
"a, b c d e f g".split(" "); //Compiled, split on spaces

This time, since the original string already contains a comma (and we don't want to split on the comma character), the comma can't be chosen from the array of low-Huffman-encoded characters, so the next best choice is selected (the space).

Update

Following some further research into this, it is definitely not a bug. This behaviour is actually by design, and in my opinion it's a very clever little optimisation, when you bear in mind that the Closure compiler tends to favour the speed of the compiled code over size.

Above I mentioned Huffman encoding a couple of times. The Huffman coding algorithm, explained very simply, assigns a weight to each character appearing the the text to be encoded. The weight is based on the frequency with which each character appears. These frequencies are used to build a binary tree, with the most common character at the root. That means the most common characters are quicker to decode, since they are closer to the root of the tree.

And since the Huffman algorithm is a large part of the DEFLATE algorithm used by gzip. So if your web server is configured to use gzip, your users will be benefiting from this clever optimisation.

answered Sep 21 '22 05:09

James Allardice

This issue was fixed on Apr 20, 2012 see revision: https://code.google.com/p/closure-compiler/source/detail?r=1267364f742588a835d78808d0eef8c9f8ba8161

answered Sep 20 '22 05:09

John

Related questions
                            
                                Image grid in reStructuredText / Sphinx
                            
                                Resetting Storyboard on Logout
                            
                                Problems casting NAN floats to int
                            
                                Can you access UI elements from another thread? (get not set)
                            
                                What are my options to check for viruses on a PHP upload?
                            
                                Correct way to include CSS after <head>
                            
                                Where can I find "j_security_check"?
                            
                                Is there a simple way to compare BufferedImage instances?
                            
                                OpenSSL create SHA hash from shell stdin
                            
                                Extra wrappers in Backbone and Marionette
                            
                                google chrome bug?
                            
                                "Find and Replace" text box holds the cursor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With