From what I understand, unicode characters have various representations.
e.g., code point or hex byte (these two representations are not always the same if UTF-8 encoding is used).
If I want to search for a visible unicode character (e.g., 汉
) I can just copy it and search. This works even if I do not know its underlying unicode representation. But for other characters which may not be easily visible, such as zeros width space, that way does not work well. For these characters, we may want to search it using its code point.
If I have known a character's code point, how do I search it in sublime text using regular expression? I highlight sublime text because different editors may use different format.
• Search by Unicode Character Name: If you know the full or partial Unicode character name and would like to find the actual character associated with that name, then type the character name here. For example, type ” RADIOACTIVE SIGN ” in the Search by Unicode Name search box to show the actual RADIOACTIVE SIGN character.
Each Unicode character has its own number and HTML-code. Example: Cyrillic capital letter Э has number U+042D (042D – it is hexadecimal number), code ъ. In a table, letter Э located at intersection line no. 0420 and column D. If you want to know number of some Unicode symbol, you may found it in a table. Or paste it to the search string.
This Unicode Character Lookup Table is a reference tool to search for Unicode characters (or symbols) by Unicode Character Name or Unicode Number (or Code Point). It is also a Unicode character detector tool if you search the table using the actual Unicode character.
Our Unicode counter calculates the exact characters in the text message. It is limited to 160 GSM characters and 70 Unicode symbol characters. Our Unicode counter calculates the exact characters in the text message. It is limited to 160 GSM characters and 70 Unicode symbol characters.
\x{200b}
Demo
\xa0
Demo
For unicode character whose code point is CODE_POINT
(code point must be in hexadecimal format), we can safely use regular expression of the format \x{CODE_POINT}
to search it.
For unicode characters whose code points can fit in two hex digits, it is fine to use \x
without curly braces, but for those characters whose code points are more than two hex digits, you have to use \x
followed by curly braces.
For example, in order to find character A
, you can use either \x{41}
or \x41
to search it.
As another example, in order to find 我
(according to here, its code point is U+6211
), you have to use \x{6211}
to search it instead of \x6211
(see image below). If you use \x6211
, you will not find the character 我
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With