Is there a list of characters that look similar to English letters?

Tags:

I’m having a crack at profanity filtering for a web forum written in Python.

As part of that, I’m attempting to write a function that takes a word, and returns all possible mock spellings of that word that use visually similar characters in place of specific letters (e.g. s†å©køv€rƒ|øw).

I expect I’ll have to expand this list over time to cover people’s creativity, but is there a list floating around anywhere on the internet that I could use as a starting point?

550

asked Feb 29 '12 00:02

Paul D. Waite

3 Answers

This is probably both vastly more deep than you need, yet not wide enough to cover your use case, but the Unicode consortium have had to deal with attacks against internationalised domain names and came up with this list of homographs (characters with the same or similar rendering):

http://www.unicode.org/Public/security/latest/confusables.txt

Might make a starting point at least.

148

answered Oct 16 '22 05:10

Robin Whittleton

http://en.wikipedia.org/wiki/Letterlike_Symbols

It's much much much less comprehensive but is more comprehensible.

answered Oct 16 '22 06:10

spnzr

I created a python class to do exactly this, based on Robin's unicode link for "confusables"

https://github.com/wanderingstan/Confusables

For example, "Hello" would get expanded into the following set of regexp character classes:

[H\Ｈ\ℋ\ℌ\ℍ\𝐇\𝐻\𝑯\𝓗\𝕳\𝖧\𝗛\𝘏\𝙃\𝙷\Η\𝚮\𝛨\𝜢\𝝜\𝞖\Ⲏ\Н\Ꮋ\ᕼ\ꓧ\𐋏\Ⱨ\Ң\Ħ\Ӊ\Ӈ] [e\℮\ｅ\ℯ\ⅇ\𝐞\𝑒\𝒆\𝓮\𝔢\𝕖\𝖊\𝖾\𝗲\𝘦\𝙚\𝚎\ꬲ\е\ҽ\ɇ\ҿ] [l\‎\|\∣\⏽\￨1\‎\۱\𐌠\‎\𝟏\𝟙\𝟣\𝟭\𝟷I\Ｉ\Ⅰ\ℐ\ℑ\𝐈\𝐼\𝑰\𝓘\𝕀\𝕴\𝖨\𝗜\𝘐\𝙄\𝙸\Ɩ\ｌ\ⅼ\ℓ\𝐥\𝑙\𝒍\𝓁\𝓵\𝔩\𝕝\𝖑\𝗅\𝗹\𝘭\𝙡\𝚕\ǀ\Ι\𝚰\𝛪\𝜤\𝝞\𝞘\Ⲓ\І\Ӏ\‎\‎\‎\‎\‎\‎\‎\‎\ⵏ\ᛁ\ꓲ\𖼨\𐊊\𐌉\‎\‎\ł\ɭ\Ɨ\ƚ\ɫ\‎\‎\‎\‎\ŀ\Ŀ\ᒷ\🄂\⒈\‎\⒓\㏫\㋋\㍤\⒔\㏬\㍥\⒕\㏭\㍦\⒖\㏮\㍧\⒗\㏯\㍨\⒘\㏰\㍩\⒙\㏱\㍪\⒚\㏲\㍫\ǉ\Ĳ\‖\∥\Ⅱ\ǁ\‎\𐆙\⒒\Ⅲ\𐆘\㏪\㋊\㍣\Ю\⒑\㏩\㋉\㍢\ʪ\₶\Ⅳ\Ⅸ\ɮ\ʫ\㏠\㋀\㍙] [l\‎\|\∣\⏽\￨1\‎\۱\𐌠\‎\𝟏\𝟙\𝟣\𝟭\𝟷I\Ｉ\Ⅰ\ℐ\ℑ\𝐈\𝐼\𝑰\𝓘\𝕀\𝕴\𝖨\𝗜\𝘐\𝙄\𝙸\Ɩ\ｌ\ⅼ\ℓ\𝐥\𝑙\𝒍\𝓁\𝓵\𝔩\𝕝\𝖑\𝗅\𝗹\𝘭\𝙡\𝚕\ǀ\Ι\𝚰\𝛪\𝜤\𝝞\𝞘\Ⲓ\І\Ӏ\‎\‎\‎\‎\‎\‎\‎\‎\ⵏ\ᛁ\ꓲ\𖼨\𐊊\𐌉\‎\‎\ł\ɭ\Ɨ\ƚ\ɫ\‎\‎\‎\‎\ŀ\Ŀ\ᒷ\🄂\⒈\‎\⒓\㏫\㋋\㍤\⒔\㏬\㍥\⒕\㏭\㍦\⒖\㏮\㍧\⒗\㏯\㍨\⒘\㏰\㍩\⒙\㏱\㍪\⒚\㏲\㍫\ǉ\Ĳ\‖\∥\Ⅱ\ǁ\‎\𐆙\⒒\Ⅲ\𐆘\㏪\㋊\㍣\Ю\⒑\㏩\㋉\㍢\ʪ\₶\Ⅳ\Ⅸ\ɮ\ʫ\㏠\㋀\㍙] [o\ం\ಂ\ം\ං\०\੦\૦\௦\౦\೦\൦\๐\໐\၀\‎\۵\ｏ\ℴ\𝐨\𝑜\𝒐\𝓸\𝔬\𝕠\𝖔\𝗈\𝗼\𝘰\𝙤\𝚘\ᴏ\ᴑ\ꬽ\ο\𝛐\𝜊\𝝄\𝝾\𝞸\σ\𝛔\𝜎\𝝈\𝞂\𝞼\ⲟ\о\ჿ\օ\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\ഠ\ဝ\𐓪\𑣈\𑣗\𐐬\‎\ø\ꬾ\ɵ\ꝋ\ө\ѳ\ꮎ\ꮻ\ꭴ\‎\ơ\œ\ɶ\∞\ꝏ\ꚙ\ൟ\တ]

This regexp will match against "𝓗℮𝐥1೦"

answered Oct 16 '22 07:10

Stan James

Related questions
                            
                                How can I install various Python libraries in Jython?
                            
                                scoped_session(sessionmaker()) or plain sessionmaker() in sqlalchemy?
                            
                                How to unpack tuple of length n to m<n variables [duplicate]
                            
                                Why does creating a list from a list make it larger?
                            
                                How do I use '~' (tilde) in the context of paths?
                            
                                Capture groups with Regular Expression (Python)
                            
                                Why is the size of 2⁶³ 36 bytes, but 2⁶³-1 is only 24 bytes?
                            
                                Get the Olson TZ name for the local timezone?
                            
                                using class methods as celery tasks
                            
                                NameError: global name 'execfile' is not defined trying to run an app on Google App Engine Launcher
                            
                                Python2: Should I use Pickle or cPickle?
                            
                                How do you skip over a list comprehension in Python's debugger (pdb)?
                            
                                What is the difference between a stack and a frame?
                            
                                Add new column in Pandas DataFrame Python [duplicate]
                            
                                deepcopy() is extremely slow
                            
                                Python: Using xpath locally / on a specific element
                            
                                Function returns None without return statement
                            
                                tag generation from a text content
                            
                                Python TypeError on regex [duplicate]
                            
                                Install pip in docker

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a list of characters that look similar to English letters?

Tags:

python

unicode

glyph

profanity