Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Erlang and binary with Cyrillic

I need to be able to use binaries with Cyrillic characters in them. I tried just writing <<"абвгд">> but I got a badarg error.

How can I work with Cyrillic (or unicode) strings in Erlang?

like image 342
0xAX Avatar asked May 15 '12 07:05

0xAX


2 Answers

If you want to input the above expression in erlang shell, please read unicode module user manual. Function character_to_binary, and character_to_list are both reversable function. The following are an example:

([email protected])37> io:getopts().
[{expand_fun,#Fun<group.0.33302583>},
 {echo,true},
 {binary,false},
 {encoding,unicode}]

([email protected])40> A = unicode:characters_to_binary("上海").
<<228,184,138,230,181,183>>

([email protected])41> unicode:characters_to_list(A).
[19978,28023]

([email protected])45> io:format("~s~n",[ unicode:characters_to_list(A,utf8)]).
** exception error: bad argument
     in function  io:format/3
        called as io:format(<0.30.0>,"~s~n",[[19978,28023]])

([email protected])46> io:format("~ts~n",[ unicode:characters_to_list(A,utf8)]).
上海
ok

If you want to use unicode:characters_to_binary("上海"). directly in the source code, it is a little more complex. You can try it firstly to find difference.

like image 197
Chen Yu Avatar answered Oct 17 '22 23:10

Chen Yu


The Erlang compiler will interpret the code as ISO-8859-1 encoded text, which limits you to Latin characters. Although you may be able to bang in some ISO characters that may have the same byte representation as you want in Unicode, this is not a very good idea.

You want to make sure your editor reads and writes ISO-8859-1, and you want to avoid using literals as much as possible. Source these strings from files.

like image 43
dsmith Avatar answered Oct 18 '22 01:10

dsmith