Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you remove accents from a string in Haskell?

Tags:

haskell

I would like a function to remove accents from a string. Example input/output:

regardé -> regarde
fête    -> fete
like image 954
Adam Hammes Avatar asked May 31 '17 16:05

Adam Hammes


1 Answers

The text-icu library contains a variety of Unicode utilities. We will also need the text library in order to convert our Strings to Text. I installed them by adding the following two lines to build-depends in my cabal file:

build-depends:     --- other packages...
                   , text-icu >= 0.7.0.1 && < 1
                   , text

With those dependencies installed, we can remove accents with the following process:

  1. Convert the input String to Text
  2. Normalize the input (see the documentation for why this is necessary)
  3. Filter out the accents
  4. Convert back to String.

Keeping all that in mind, we come up with the following function:

import Data.List
import qualified Data.Text as T
import Data.Text.ICU.Char
import Data.Text.ICU.Normalize

canonicalForm :: String -> String
canonicalForm s = T.unpack noAccents
  where
    noAccents = T.filter (not . property Diacritic) normalizedText
    normalizedText = normalize NFD (T.pack s)

If you don't need to convert from a String, you can skip the T.pack and T.unpack calls.

like image 75
Adam Hammes Avatar answered Oct 05 '22 07:10

Adam Hammes