Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Text Blocks: Mix of Tabs and Spaces within Indentation Prefixes

Java 15 introduced (non-preview) text blocks feature. It allows to define multi-lined string literals without breaking code indentation, by stripping the common white space prefix from the lines. The algorithm is described in JEP 378.

But how exactly is "common white space prefix" defined in the case that lines are indented using mix of tabs and spaces?

For example, what would be the string value in the following case (· means a space, means a tab character):

→   →   ····String text = """
→   →   ····→   line1
→   ········→   line2
→   ····→   →   """;

A simple test with OpenJDK shows that the result string is:

line1
··→   line2

So it looks like Javac just counts white space symbols, including spaces and tabs, and uses the count — treating spaces (0x20) and tabs (0x09) equally. Is this the expected behavior?


Side note: this is not a purely theoretical question; it has practical importance for a project with mixed spaces/tabs indentation and large codebase.

like image 998
Alex Shesterov Avatar asked Nov 03 '20 12:11

Alex Shesterov


1 Answers

I've found the answer which I'd like to share.

Java compiler indeed treats spaces, tabs and all other whitespace characters equally.

So the same amount of (any) whitespace characters is removed from every line.


Details:

javac tokenizer uses the String.stripIndent() method, which has the following implementation note:

This method treats all white space characters as having equal width. As long as the indentation on every line is consistently composed of the same character sequences, then the result will be as described above.

like image 187
Alex Shesterov Avatar answered Oct 05 '22 23:10

Alex Shesterov