So I'm trying to convert a binary to a string. This code:
t = [{<<71,0,69,0,84,0>>}]
String.from_char_list(t)
But I'm getting this when I try this conversion:
** (ArgumentError) argument error
(stdlib) :unicode.characters_to_binary([{<<70, 0, 73, 0, 78, 0>>}])
(elixir) lib/string.ex:1161: String.from_char_list/1
I'm assuming the <<70, 0, etc. is likely a list of graphemes (it's the return from an API call and the API is not quite documented) but do I need to specify the encoding somehow?
I know I'm likely missing something obvious (maybe that's not the right function to use?) but I can't seem to figure out what to do here.
EDIT:
For what it's worth, the binary above is the return value of an Erlang ODBC call. After a little more digging I found that the binary in question is actually a "Unicode binary encoded as UTF16 little endian" (see here: http://www.erlang.org/doc/apps/odbc/odbc.pdf pg. 9 re: SQL_WVARCHAR) Doesn't really change the issue but it does add some context.
There's a couple of things here:
1.) You have a list with a tuple containing one element, a binary. You can probably just extract the binary and have your string. Passing the current data structure to to_string
is not going to work.
2.) The binary you used in your example contains 0
, an unprintable character. In the shell, this will not be printed properly as a string, due to the fact that Elixir can't tell the difference between just a binary, and a binary representing a string, when the binary representing a string contains unprintable characters.
3.) You can use pattern matching to convert a binary to a particular type. For instance:
iex> raw = <<71,32,69,32,84,32>>
...> Enum.join(for <<c::utf8 <- raw>>, do: <<c::utf8>>)
"G E T "
...> <<c::utf8, _::binary>> = raw
"G"
Also, if you are getting binary data from a network connection, you probably want to use :erlang.iolist_to_binary
, since the data will be an iolist, not a charlist. The difference is that iolists can contain binaries, nested lists, as well as just be a list of integers. Charlists are always just a flat list of integers. If you call to_string
, on an iolist, it will fail.
I made a function to convert binary to string
def raw_binary_to_string(raw) do
codepoints = String.codepoints(raw)
val = Enum.reduce(codepoints,
fn(w, result) ->
cond do
String.valid?(w) ->
result <> w
true ->
<< parsed :: 8>> = w
result <> << parsed :: utf8 >>
end
end)
end
Executed on iex console
iex(6)>raw=<<65, 241, 111, 32, 100, 101, 32, 70, 97, 99, 116, 117, 114, 97, 99, 105, 111, 110, 32, 65, 99, 116, 117, 97, 108>>
iex(6)>raw_binary_to_string(raw)
iex(6)>"Año de Facturacion Actual"
Not sure if OP has since solved his problem, but in relation to his remark about his binary being utf16-le
: for specifically that encoding, I found that the quickest (and to those more experienced with Elixir, probably-hacky) way was to use Enum.reduce
:
# coercing it into utf8 gives us ["D", <<0>>, "e", <<0>>, "v", <<0>>, "a", <<0>>, "s", <<0>>, "t", <<0>>, "a", <<0>>, "t", <<0>>, "o", <<0>>, "r", <<0>>]
<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0>>
|> String.codepoints()
|> Enum.reduce("", fn(codepoint, result) ->
<< parsed :: 8>> = codepoint
if parsed == 0, do: result, else: result <> <<parsed>>
end)
# "Devastator"
|> IO.puts()
Assumptions:
utf16-le
encoding
the codepoints are backwards-compatible with utf8
i.e. they use only 1 byte
Since I'm still learning Elixir, it took me a while to get to this solution. I looked into other libraries people made, even using something like iconv
at a bash level.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With