Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove zero-width space characters from a JavaScript string

I take user-input (JS code) and execute (process) them in realtime to show some output.

Sometimes the code has those zero-width spaces; it's really weird. I don't know how the users are inputting that. Example: "(​$".length === 3

I need to be able to remove that character from my code in JS. How do I do so? or maybe there's some other way to execute that JS code so that the browser doesn't take the zero-width space characters into account?

like image 982
user1437328 Avatar asked Jul 03 '12 06:07

user1437328


People also ask

Why does zero-width space exist?

The zero-width space (​), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate word boundaries to text-processing systems in scripts that do not use explicit spacing, or after characters (such as the slash) that are not followed by a visible space but after which there may ...

How do you write a zero width non-Joiner?

The ZWNJ is encoded in Unicode as U+200C ZERO WIDTH NON-JOINER ( ‌).

How do you get rid of zero-width space in Python?

Use the str. replace() method to remove zero width space characters from a string, e.g. result = my_str. replace('\u200c', '') .


2 Answers

Unicode has the following zero-width characters:

  • U+200B zero width space
  • U+200C zero width non-joiner Unicode code point
  • U+200D zero width joiner Unicode code point
  • U+FEFF zero width no-break space Unicode code point

To remove them from a string in JavaScript, you can use a simple regular expression:

var userInput = 'a\u200Bb\u200Cc\u200Dd\uFEFFe'; console.log(userInput.length); // 9 var result = userInput.replace(/[\u200B-\u200D\uFEFF]/g, ''); console.log(result.length); // 5 

Note that there are many more symbols that may not be visible. Some of ASCII’s control characters, for example.

like image 77
Mathias Bynens Avatar answered Oct 10 '22 03:10

Mathias Bynens


I had a problem some invisible characters were corrupting my JSON and causing Unexpected Token ILLEGAL exception which was crashing my site.

Here is my solution using RegExp variable:

    var re = new RegExp("\u2028|\u2029");     var result = text.replace(re, ''); 

More about Javascript and zero width spaces you can find here: Zero Width Spaces

like image 43
Technotronic Avatar answered Oct 10 '22 04:10

Technotronic