Java Text Blocks: Mix of Tabs and Spaces within Indentation Prefixes

Question

Java 15 introduced (non-preview) text blocks feature. It allows to define multi-lined string literals without breaking code indentation, by stripping the common white space prefix from the lines. The algorithm is described in JEP 378.

But how exactly is "common white space prefix" defined in the case that lines are indented using mix of tabs and spaces?

For example, what would be the string value in the following case (· means a space, → means a tab character):

→   →   ····String text = """
→   →   ····→   line1
→   ········→   line2
→   ····→   →   """;

A simple test with OpenJDK shows that the result string is:

line1
··→   line2

So it looks like Javac just counts white space symbols, including spaces and tabs, and uses the count — treating spaces (0x20) and tabs (0x09) equally. Is this the expected behavior?

Side note: this is not a purely theoretical question; it has practical importance for a project with mixed spaces/tabs indentation and large codebase.

Alex Shesterov · Accepted Answer

I've found the answer which I'd like to share.

Java compiler indeed treats spaces, tabs and all other whitespace characters equally.

So the same amount of (any) whitespace characters is removed from every line.

Details:

javac tokenizer uses the String.stripIndent() method, which has the following implementation note:

This method treats all white space characters as having equal width. As long as the indentation on every line is consistently composed of the same character sequences, then the result will be as described above.

Java Text Blocks: Mix of Tabs and Spaces within Indentation Prefixes

Tags:

java

string

java-15

java-text-blocks

Alex Shesterov

1 Answers

Alex Shesterov

Recent Activity

Donate For Us

Java Text Blocks: Mix of Tabs and Spaces within Indentation Prefixes

Tags:

java

string

java-15

java-text-blocks

Alex Shesterov

1 Answers

Alex Shesterov

Related questions

Recent Activity

Donate For Us