Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to search a unicode character using its code point in sublime text

From what I understand, unicode characters have various representations.

e.g., code point or hex byte (these two representations are not always the same if UTF-8 encoding is used).

If I want to search for a visible unicode character (e.g., ) I can just copy it and search. This works even if I do not know its underlying unicode representation. But for other characters which may not be easily visible, such as zeros width space, that way does not work well. For these characters, we may want to search it using its code point.

My question

If I have known a character's code point, how do I search it in sublime text using regular expression? I highlight sublime text because different editors may use different format.

like image 988
jdhao Avatar asked Dec 13 '17 02:12

jdhao


People also ask

How do I search for a Unicode character?

• Search by Unicode Character Name: If you know the full or partial Unicode character name and would like to find the actual character associated with that name, then type the character name here. For example, type ” RADIOACTIVE SIGN ” in the Search by Unicode Name search box to show the actual RADIOACTIVE SIGN character.

How to find the Unicode symbol number and HTML-code?

Each Unicode character has its own number and HTML-code. Example: Cyrillic capital letter Э has number U+042D (042D – it is hexadecimal number), code ъ. In a table, letter Э located at intersection line no. 0420 and column D. If you want to know number of some Unicode symbol, you may found it in a table. Or paste it to the search string.

What is a Unicode character lookup table?

This Unicode Character Lookup Table is a reference tool to search for Unicode characters (or symbols) by Unicode Character Name or Unicode Number (or Code Point). It is also a Unicode character detector tool if you search the table using the actual Unicode character.

What is the Unicode counter in the text messages?

Our Unicode counter calculates the exact characters in the text message. It is limited to 160 GSM characters and 70 Unicode symbol characters. Our Unicode counter calculates the exact characters in the text message. It is limited to 160 GSM characters and 70 Unicode symbol characters.


2 Answers

  1. Zero width space characters can be found via:

\x{200b}

Demo

  1. Non breaking space characters can be found via:

\xa0

Demo

like image 94
CinCout Avatar answered Nov 09 '22 05:11

CinCout


For unicode character whose code point is CODE_POINT (code point must be in hexadecimal format), we can safely use regular expression of the format \x{CODE_POINT} to search it.

General rules

For unicode characters whose code points can fit in two hex digits, it is fine to use \x without curly braces, but for those characters whose code points are more than two hex digits, you have to use \x followed by curly braces.

Some examples

For example, in order to find character A, you can use either \x{41} or \x41 to search it.

As another example, in order to find (according to here, its code point is U+6211), you have to use \x{6211} to search it instead of \x6211 (see image below). If you use \x6211, you will not find the character .

enter image description here

like image 20
jdhao Avatar answered Nov 09 '22 04:11

jdhao