Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zero Width Space character in string literals: Detect and/or prevent

Visual Studio allows the Unicode character Zero Width Space (U+200B) to be pasted into the code editor. This character is (as the name implies) invisible.

This can lead to nasty bugs such as the one I just spent time troubleshooting where HttpWebRequest.CreateHttp(string url) threw a UriFormatException when passed the innocuous-looking string literal "​http://www.umich.edu".

The exception occurred because the first character in that string literal is not h, as it appears, but is the Zero Width Space character. It was put onto the clipboard when I copied the URL from the body of a web page, and was dutifully pasted into my code by Visual Studio when I hit Ctrl+V in the code editor window.

Turning on the "View White Space" option (Edit > Advanced > View White Space) does not cause Visual Studio to reveal that there's a Zero Width Space character present.

I would like Visual Studio to:

  • Give some kind of indication or warning when there's a Zero Width Space character (or other invisible character) in a string literal in my code, and/or
  • Prevent such control characters from being pasted into the code editor in the first place.

Is there a way to make Visual Studio do this?

like image 205
Jon Schneider Avatar asked Aug 14 '15 15:08

Jon Schneider


People also ask

What is a zero width space character?

The zero-width space (​), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate word boundaries to text-processing systems in scripts that do not use explicit spacing, or after characters (such as the slash) that are not followed by a visible space but after which there may ...

How do you remove zero width space from a string?

replace() method to remove the Unicode zero width non-joiner \u200c characters from the string. The same approach can be used to remove Unicode zero width space characters \u200b .

How do you find the zero width space?

The zero width space is Unicode character U+200B. (HTML ​). It's remarkably hard to type. On Windows you can type Alt-8203.

How do you remove zero width space in Java?

Finally, I am able to remove 'Zero Width Space' character by using 'Unicode Regex'. String plainEmailBody = new String(); plainEmailBody = emailBodyStr. replaceAll("[\\p{Cf}]", "");


1 Answers

The Gremlins tracker for Visual Studio Code worked for me.
https://github.com/nhoizey/vscode-gremlins

P.s.: IntelliJ recognizes those hidden troublesome chars OOTB

enter image description here

like image 76
4F2E4A2E Avatar answered Oct 26 '22 06:10

4F2E4A2E