Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

\u200b (Zero width space) characters in my JS code. Where did they come from?

I am developing a front end of a web app using NetBeans IDE 7.0.1. Recently I had a very nasty bug, which I finally fixed.

Say I have code

var element = '<input size="3" id="foo" name="elements[foo][0]" />'; $('#bar').append(element); 

I noticed that something gone wrong when I saw that size attribute doesn't work in Chrome (didn't checked in other browsers). When I opened that element in Inspector, it was interpreted as something like

<input id="&quot;3&quot;" name="&quot;elements[foo][0]&quot;"      size="&quot;foo&quot;" /> 

Which was rather strange. After manually retyping the element string character-in-character, the bug was gone. When I undo'ed that change I noticed that Netbeans alerted me about some Unicode characters in my old code. It was \u200b - a zero width spaces after each '=', between '][' and in the end of the string. So the string appeared normal because zero width spaces wasn't displayed, but after escaping them my string was

'<input size=\u200b"3" id=\u200b"foo" name=\u200b"elements[foo]\u200b[0]" />\u200b' 

Now where the hell did I get them?

I'm not sure where did I copied the code of element from, but it's definitely one of the following:

  • Other pane of Netbeans Editor with HTML template file;
  • Google Chrome Inspector, 'Copy as HTML' action;
  • Google Chrome source view page (very doubtfully).

But I can't reproduce the bug with neither of that.

I use Netbeans 7.0.1 and Google Chrome 13.0 under Windows 7. No keyboard switchers or anything like it is running. Also I'm using Git for version control, but I didn't pulled that code, so it is very unlikely that Git is to blame. It can't be a stupid joke of my colleagues, because they are quite well-mannered.

Any suggestions who messed up my code?

like image 229
Hnatt Avatar asked Aug 14 '11 07:08

Hnatt


People also ask

What does \u200B mean?

U200b is a Unicode non-printing space. It's meant to assist typographers in doing page layouts, and it's extremely useful in certain languages that don't use the Roman alphabet.

How do you get rid of a zero width space character?

To remove zero-width space characters from a JavaScript string, we can use the JavaScript string replace method that matches all zero-width characters and replace them with empty strings. Zero-width characters in Unicode includes: U+200B zero width space. U+200C zero-width non-joiner Unicode code point.

What are zero width characters used for?

The zero-width space (​), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate word boundaries to text-processing systems in scripts that do not use explicit spacing, or after characters (such as the slash) that are not followed by a visible space but after which there may ...

How do you write a zero width Unicode character?

Alt + 8203 A zero-width space is, for the most part, invisible.


2 Answers

Here's a stab in the dark.

My bet would be on Google Chrome Inspector. Searching through the Chromium source, I spotted the following block of code

    if (hasText)         attrSpanElement.appendChild(document.createTextNode("=\u200B\""));      if (linkify && (name === "src" || name === "href")) {         var rewrittenHref = WebInspector.resourceURLForRelatedNode(node, value);         value = value.replace(/([\/;:\)\]\}])/g, "$1\u200B");         attrSpanElement.appendChild(linkify(rewrittenHref, value, "webkit-html-attribute-value", node.nodeName().toLowerCase() === "a"));     } else {         value = value.replace(/([\/;:\)\]\}])/g, "$1\u200B");         var attrValueElement = attrSpanElement.createChild("span", "webkit-html-attribute-value");         attrValueElement.textContent = value;     } 

It's quite possible that I'm simply barking up the wrong tree here, but it looks like zero-width spaces were being inserted (to handle soft text wrapping?) during the display of attributes. Perhaps the "Copy as HTML" function had not properly removed them?


Update

After fiddling with the Chrome element inspector, I'm almost convinced that's where your stray \u200b came from. Notice how the line can wrap not only at visible space but also after = or chars matched by /([\/;:\)\]\}])/ thanks to the inserted zero-width space.

chrome inspector screenshot

Unfortunately, I am unable to replicate your problem where they inadvertently get included into your clipboard (I used Chrome 13.0.782.112 on Win XP).

It would certainly be worth submitting a bug report should your be able to reproduce the behaviour.

like image 96
Shawn Chin Avatar answered Oct 19 '22 03:10

Shawn Chin


This happened to me when I copied source code from another site into my editor. If your using visual studio code or Atom editor, this will highlight those pesky characters zero-width space \u200b) etc.

  • VSCode: https://marketplace.visualstudio.com/items?itemName=nhoizey.gremlins
  • Atom editor: https://atom.io/packages/highlight-bad-chars
  • Sublime Text: https://packagecontrol.io/packages/Gremlins
like image 42
Clifford Fajardo Avatar answered Oct 19 '22 02:10

Clifford Fajardo