This question is a continuation of Java string searching ignoring accents.
The answer to the original question shows us how to remove the diacritics from strings. So, for instance, köln becomes koln. But łódź becomes łodz - note the l with stroke.
My question is how can I remove the stroke as well, so that łódź becomes lodz?
Thanks.
You cannot, at least not trivially for all such letters. The letter ł is (except for appearance and its Unicode name) not linked to l at all (in Unicode at least; linguistically that's a different matter).
Your only option might be a conversion table for your use case you can fill with all the characters you need to convert.
As tchrist suggested, I attempted to use ICU (V 50.1): it didn't recognize it as derived from L either. The L with stroke seems to be a special case in Unicode. Look at http://bugs.mysql.com/bug.php?id=11369 They say in Unicode 4.0 it was not connected to L, while in Unicode 4.1 it is. I wonder if anyone tested the problem with a Unicode4.1-based Java library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With