Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is this unicode invisible character?

While trying to parse some unicode text strings, I'm hitting an invisible character that I can't find any definition for. If I paste it in to a text editor and show invisibles, I can see that it looks like a bullet point (• alt-8), and by copy/pasting them, I can see it has an effect like a space or tab, but it's none of those.

I need to test for it, something like...

 if(uniChar == L'\t') 

But of course I need to provide something to match to.

It has bytes 0xc2 0xa0 in UTF-8.

If no-one has a definition, is there any devious way to test for something I can't define!?

(I happen to be using NSStrings in Objective-C, OSX, Xcode, but I don't think that has any bearing.)

like image 693
Joey FourSheds Avatar asked Jan 14 '23 02:01

Joey FourSheds


1 Answers

Bytes C2 A0 in UTF-8 encode U+00A0 ɴᴏ-ʙʀᴇᴀᴋ sᴘᴀᴄᴇ, which can be used, for example, to display combining marks in isolation. It is   as a named HTML entity. It is almost the same as a U+0020 sᴘᴀᴄᴇ, except it prevents line breaks before or after it, and acts as a numerical separator for bidirectional layout.

The dot you see when you ask a text editor to show invisibles just happens to be what glyph the text editor chose to display spaces. It does not mean the character in question is U+00B7 ᴍɪᴅᴅʟᴇ ᴅᴏᴛ, which is definitely not invisible.

In code, if you have it as a unichar, you can compare it to L'\x00A0'.

like image 129
R. Martinho Fernandes Avatar answered Feb 16 '23 22:02

R. Martinho Fernandes