My application is using Spring Integration for email polling from Outlook mailbox.
As, it is receiving the String (email body)from an external system (Outlook), So I have no control over it.
For Example,
String emailBodyStr= "rejected by sundar14-\u200B.";
Now I am trying to remove the unicode character \u200B from this String.
What I tried already.
Try#1:
emailBodyStr = emailBodyStr.replaceAll("\u200B", "");
Try#2:
`emailBodyStr = emailBodyStr.replaceAll("\u200B", "").trim();`
Try#3 (using Apache Commons):
StringEscapeUtils.unescapeJava(emailBodyStr);
Try#4:
StringEscapeUtils.unescapeJava(emailBodyStr).trim();
Nothing worked till now.
When I tried to print this String using below code.
logger.info("Comment BEFORE:{}",emailBodyStr);
logger.info("Comment AFTER :{}",emailBodyStr);
In Eclipse console, it is NOT printing unicode char,
Comment BEFORE:rejected by sundar14-.
But the same code prints the unicode char in Linux console as below.
Comment BEFORE:rejected by sundar14-\u200B.
I read some examples where str.replace() is recommended, but please note that examples uses javascript, PHP and not Java.
Finally, I am able to remove 'Zero Width Space' character by using 'Unicode Regex'. String plainEmailBody = new String(); plainEmailBody = emailBodyStr. replaceAll("[\\p{Cf}]", "");
replace() method to remove the Unicode zero width non-joiner \u200c characters from the string. The same approach can be used to remove Unicode zero width space characters \u200b .
Encoding. The zero-width space character is encoded in Unicode as U+200B ZERO WIDTH SPACE, and input in HTML as ​ , ​ or ​ .
The zero width space is Unicode character U+200B. (HTML ​). It's remarkably hard to type. On Windows you can type Alt-8203.
Finally, I am able to remove 'Zero Width Space' character by using 'Unicode Regex'.
String plainEmailBody = new String();
plainEmailBody = emailBodyStr.replaceAll("[\\p{Cf}]", "");
Reference to find the category of Unicode characters.
Character class from Java lists all of these unicode categories.
Note 1: As I received this string from Outlook Email Body - none of the approaches listed in my question was working.
My application is receiving a String from an external system (Outlook), So I have no control over it.
Note 2: This SO answer helped me to know about Unicode Regular Expressions .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With