In Latin script, letters have an upper case and a lower case form. In Python, if you want to compare two strings without regard to their case, you can convert them to the same case using 'string'.upper()
or 'string'.lower()
In Arabic script, letters can have an initial, medial, or final form. Is there a similar way to compare strings of Arabic characters without caring which form the letters are in?
There are two parts to this, which should work for all languages:*
Between the two, this handles English upper and lower case, Arabic initial/medial/final (plus isolated), German ß
vs. ss
, é
as a single code point vs. e\N{COMBINING ACUTE ACCENT}
, Chinese rotated characters, Japanese half-width kana, and probably all kinds of other things you haven't thought of.
In Python, that looks like this:
>>> s1 = 'ﻧ'
>>> s2 = 'ﻨ'
>>> unicodedata.normalize('NFKD', s1).casefold() == unicodedata.normalize('NFKD', s2)
True
Note that casefold
wasn't added until Python 3.3. If you're using an earlier version of Python, there are implementations on PyPI; using them should be similar to using the 3.3+ builtin.
If you're interested in exactly how this works for Arabic, rather than just the fact that it works for Arabic along with every other language, you have read the algorithms and tables at unicode.org. IIRC, the W3C document that recommends doing this explains why it works using Arabic as an example. I believe it's because Unicode treats initial, medial, final, and isolated as compatibility-equivalent presentation forms of the same character, so normalizing to decomposed gives you effectively the isolated form plus a modifier that casefolding can skip or transform, even though casefolding directly on a combined character just returns the character itself.
* There are a few cases where two different languages or cultures use the same script, but have different case-folding rules; in that case, you need locale-specific casefolding, which Python doesn't include. But that shouldn't be relevant here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With