I would like a function to remove accents from a string. Example input/output:
regardé -> regarde
fête -> fete
The text-icu
library contains a variety of Unicode utilities. We will also need the text
library in order to convert our String
s to Text
. I installed them by adding the following two lines to build-depends
in my cabal file:
build-depends: --- other packages...
, text-icu >= 0.7.0.1 && < 1
, text
With those dependencies installed, we can remove accents with the following process:
String
to Text
String
.Keeping all that in mind, we come up with the following function:
import Data.List
import qualified Data.Text as T
import Data.Text.ICU.Char
import Data.Text.ICU.Normalize
canonicalForm :: String -> String
canonicalForm s = T.unpack noAccents
where
noAccents = T.filter (not . property Diacritic) normalizedText
normalizedText = normalize NFD (T.pack s)
If you don't need to convert from a String
, you can skip the T.pack
and T.unpack
calls.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With