Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

normalize the strings in Elixir/Phoenix

I want to normalize the Unicode(UTF-8) strings posted from users thru a <form>. Is there any library which treats those things in Elixir(or in Phoenix or in Erlang)? I'm used to do it in Python like following, but I don't know Elixir has those libraries.

import unicodedata
import zenhan
import jctconv

def normalize(strings, unistr = 'NFKC')
    norm = unicodedata.normalize(unistr, strings)
    zenhan = zenhan.z2h(norm, mode=2)
    katahira = jctconv.kata2hira(zenhan)

    return katahira
like image 214
hykw Avatar asked Sep 25 '22 17:09

hykw


2 Answers

Since Elixir 1.2 there is a String.normalize/2 function. I'm not sure what those python libraries are doing, but this functions is probably a good start for what you want to achieve.

like image 179
michalmuskala Avatar answered Nov 15 '22 09:11

michalmuskala


If you type h String.normalize inside iex, you'll get the right information and some examples.

Converts all characters in binary to Unicode normalization form 
identified by
form.

Forms

The supported forms are:

  • :nfd - Normalization Form Canonical Decomposition. Characters are
    decomposed by canonical equivalence, and multiple combining characters are
    arranged in a specific order.
  • :nfc - Normalization Form Canonical Composition. Characters are
    decomposed and then recomposed by canonical equivalence.

Examples

┃ iex> String.normalize("yêṩ", :nfd)
┃ "yêṩ"
┃
┃ iex> String.normalize("leña", :nfc)
┃ "leña"
like image 20
Alberto Romero Avatar answered Nov 15 '22 07:11

Alberto Romero