Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Java regard a 'normal' space as whitespace for the purposes of Character.isWhitespace?

This might seem to be a no-brainer at first glance. It probably is, but a few things have me feeling I might be missing something. First the Java documentation for Character.isWhitespace seems to exclude it. The definition for what is and what is not allowed as whitespace seems very definitive ('if and only if') and the first line says not a non-breaking space.

I have always regarded an ordinary 'space' - that thing you get when you press a spacebar - as a non-breaking space.

So where does it fit in the list? The list is regarded as definitive in the highest rated answer to this question - am I simply reading it wrong? Also, the first commenter to the top answer in this question seems to have his doubts (in a different context). Yet when I construct a simple bit of code to test, it indicates that a normal space is an instance of whitespace, as one would expect.

import static java.lang.Character.isWhitespace;

public class WhitespaceCheck {

   public static void main(String[] args) {

        Character test = ' ';

        if (Character.isWhitespace(test)) {
            System.out.println("Is whitespace!" );                        
        } else {
            System.out.println("Is not whitespace!" );
        }
    }
}

So, am I reading the first item on the list wrongly, is it somewhere else on the list, or is the list itself simply wrong?

like image 520
Ger Avatar asked Nov 19 '15 22:11

Ger


People also ask

Is space a whitespace in Java?

A character is called a whitespace character in Java if and only if Character. isWhitespace(char) method returns true. The most commonly used whitespace characters are \n, \t, \r and space. The regular-expression pattern for whitespace characters is \s .

What does white space mean in Java?

A character in Java can be considered as a whitespace character if one of the following criteria is satisfied: The character is a Unicode space character (either a SPACE_SEPARATOR, or a LINE_SEPARATOR, or a PARAGRAPH_SEPARATOR) but it must not be a non-breaking space. The character must be a HORIZONTAL TABULATION.

Is whitespace considered a character?

In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page.

Do characters include spaces in Java?

The length of a string is the number of characters in the string. Thus, "cat" has length 3, "" has length 0, and "cat " has length 4. Notice that spaces count in the length, but the double quotes do not.


1 Answers

You are mistaken; a non-breaking space prevents a line break. It's a special type of space and not a "normal" one which does allow a line break. If a "normal" space was a non-breaking space then lines would never wrap when you reached the edge of the screen unless you pressed return manually each time.

The very first line says:

It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F')

A list of unicode whitespace covered by SPACE_SEPARATOR can be found here:

https://en.wikipedia.org/wiki/Whitespace_character

The documentation for SPACE_SEPARATOR says that it is referring to a category of Unicode characters, not a specific character. The 'normal' space of the title (which is what is usually produced by the spacebar) is included in this category.

like image 166
Tim B Avatar answered Oct 02 '22 09:10

Tim B