Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a UTF-16LE Elixir bitstring into an Elixir String

Given an Elixir bitstring encoded in UTF-16LE:

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>

how can I get this converted into a readable Elixir String (it spells out "Devastator")? The closest I've gotten is transforming the above into a list of the Unicode codepoints (["0044", "0065", ...]) and trying to prepend the \u escape sequence to them, but Elixir throws an error since it's an invalid sequence. I'm out of ideas.

like image 548
user701847 Avatar asked Sep 29 '16 14:09

user701847


2 Answers

The simplest way is using functions from the :unicode module:

:unicode.characters_to_binary(utf16binary, {:utf16, :little})

For example

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>
|> :unicode.characters_to_binary({:utf16, :little})
|> IO.puts
#=> Devastator

(there's a null byte at the very end, so the binary display instead of string will be used in the shell, and depending on OS it may print some extra representation for the null byte)

like image 170
michalmuskala Avatar answered Sep 17 '22 23:09

michalmuskala


You can make use of Elixir's pattern matching, specifically <<codepoint::utf16-little>>:

defmodule Convert do
  def utf16le_to_utf8(binary), do: utf16le_to_utf8(binary, "")

  defp utf16le_to_utf8(<<codepoint::utf16-little, rest::binary>>, acc) do
    utf16le_to_utf8(rest, <<acc::binary, codepoint::utf8>>)
  end
  defp utf16le_to_utf8("", acc), do: acc
end

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>
|> Convert.utf16le_to_utf8
|> IO.puts

<<192, 3, 114, 0, 178, 0>>
|> Convert.utf16le_to_utf8
|> IO.puts

Output:

Devastator
πr²
like image 33
Dogbert Avatar answered Sep 20 '22 23:09

Dogbert