Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to trim no-break space in Java?

Tags:

java

string

I've input an input file which I need to process and discard all the white-spaces, including non-breaking space U+00A0 aka   (You can produce it in Notepad by pressing Alt and then typing 0 1 6 0 from the keyboard's numeric pad.) or any other form of white space. I have tried String.trim() but it doesn't trim U+00A0.

Do I need to explicitly check for U+00A0 and then trim() or is there an easy way to trim all kinds of white-spaces in Java?

like image 954
Abhishek Avatar asked Feb 03 '15 09:02

Abhishek


People also ask

How do you cut a non-breaking space in Java?

The trim() method normally trims chars in the range 0x00-0x20, and we just added one additional character to the character class. You could also make a faster version (probably) by taking the source code for the trim() method, and modifying it to trim 00 as well as the usual range.

How do you cut a non-breaking space?

A non-breaking character has value 160 in the 7-bit ASCII system, so you can define it by using the CHAR(160) formula. The SUBSTITUTE function is used to turn non-breaking spaces into regular spaces. And finally, you embed the SUBSTITUTE statement into the TRIM function to remove the converted spaces.

How do you trim a space in Java?

To remove leading and trailing spaces in Java, use the trim() method. This method returns a copy of this string with leading and trailing white space removed, or this string if it has no leading or trailing white space.

What is u00A0 in Java?

A character is a Java whitespace character if and only if it satisfies one of the following criteria: It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F'). It is '\t', U+0009 HORIZONTAL TABULATION.


1 Answers

While   is a non breaking space (a space that does not want to be treated as whitespace), you can trim a string while preserving every   within the string with a simple regex:

string.replaceAll("(^\\h*)|(\\h*$)","") 
  • \h is a horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]

If you are using a pre JDK8 Version, you need to explicitly use the list of chars instead of \h.

like image 104
Cfx Avatar answered Sep 28 '22 00:09

Cfx