Unicode look-alikes

Question

I am looking for an easy way to match all unicode characters that look like a given letter. Consider an example for a selection of characters that look like small letter n.

import re
import sys

sys.stdout.reconfigure(encoding='utf-8')

look_alike_chars = [ 'n', 'ń', 'ņ', 'ň', 'ŉ', 'ŋ', 'ո', 'п', 'ո']
pattern = re.compile(r'[nńņňŉŋոп]')


for char in look_alike_chars:
    if pattern.match(char):
        print(f"Character '{char}' matches the pattern.")
    else:
        print(f"Character '{char}' does NOT match the pattern.")

Instead of r'[nńņňŉŋոп]' I expect something like r'[\LookLike{n}] where LookLike is a token.

If this is not possible, do you know of a program or website that lists all the look-alike symbols for a given ASCII letter?

Lrx · Accepted Answer

There is a Python library called Unidecode that will translate some of these special Unicode character to the corresponding ascii character. A code sample should be this:

import unidecode

input_string = "ñ"
clean_string = unidecode.unidecode(input_string) # = 'n'

However, the correspondences are made based on their actual idiom equivalent.

So whenever you want to translate a capital Greek 'PI' (written п), it will return a 'p' because pi is equivalent to a 'p' and not to an 'n'. Besides there is no publicly available database of such correspondences. Feel free to build one and share it!

Unicode look-alikes

Tags:

python

string

unicode

sanitizedUser

1 Answers

Lrx

Recent Activity

Donate For Us

Unicode look-alikes

Tags:

python

string

unicode

sanitizedUser

1 Answers

Lrx

Related questions

Recent Activity

Donate For Us