I want to normalize the Unicode(UTF-8) strings posted from users thru a <form>
. Is there any library which treats those things in Elixir(or in Phoenix or in Erlang)? I'm used to do it in Python like following, but I don't know Elixir has those libraries.
import unicodedata
import zenhan
import jctconv
def normalize(strings, unistr = 'NFKC')
norm = unicodedata.normalize(unistr, strings)
zenhan = zenhan.z2h(norm, mode=2)
katahira = jctconv.kata2hira(zenhan)
return katahira
Since Elixir 1.2 there is a String.normalize/2
function. I'm not sure what those python libraries are doing, but this functions is probably a good start for what you want to achieve.
If you type h String.normalize
inside iex
, you'll get the right information and some examples.
Converts all characters in binary to Unicode normalization form
identified by
form.
Forms
The supported forms are:
• :nfd - Normalization Form Canonical Decomposition. Characters are
decomposed by canonical equivalence, and multiple combining characters are
arranged in a specific order.
• :nfc - Normalization Form Canonical Composition. Characters are
decomposed and then recomposed by canonical equivalence.
Examples
┃ iex> String.normalize("yêṩ", :nfd)
┃ "yêṩ"
┃
┃ iex> String.normalize("leña", :nfc)
┃ "leña"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With