Java 15 introduced (non-preview) text blocks feature. It allows to define multi-lined string literals without breaking code indentation, by stripping the common white space prefix from the lines. The algorithm is described in JEP 378.
But how exactly is "common white space prefix" defined in the case that lines are indented using mix of tabs and spaces?
For example, what would be the string value in the following case (·
means a space, →
means a tab character):
→ → ····String text = """ → → ····→ line1 → ········→ line2 → ····→ → """;
A simple test with OpenJDK shows that the result string is:
line1 ··→ line2
So it looks like Javac just counts white space symbols, including spaces and tabs, and uses the count — treating spaces (0x20) and tabs (0x09) equally. Is this the expected behavior?
Side note: this is not a purely theoretical question; it has practical importance for a project with mixed spaces/tabs indentation and large codebase.
I've found the answer which I'd like to share.
Java compiler indeed treats spaces, tabs and all other whitespace characters equally.
So the same amount of (any) whitespace characters is removed from every line.
Details:
javac
tokenizer uses the String.stripIndent()
method, which has the following implementation note:
This method treats all white space characters as having equal width. As long as the indentation on every line is consistently composed of the same character sequences, then the result will be as described above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With