Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to build long strings in Elixir

Tags:

elixir

I want to build a relatively long string (an SVG if that matters) and return it from a function. What is the best way to build such a string in Elixir? In other languages I'd use something like a StringBuilder class. Is there an equivalent thing in Elixir?

You can append strings with the <> operator, but isn't that just a list append? It seems like doing a lot of those would get really inefficient.

like image 428
Matt Avatar asked Sep 07 '17 12:09

Matt


People also ask

How do you reverse an Elixir String?

In Elixir the to_string function can convert a number into a string. There is already a function in the String module for reversing a string called String. reverse . Strings can be converted into integers using the String.

Is Elixir a String?

Strings in Elixir are UTF-8 encoded binaries. Strings in Elixir are a sequence of Unicode characters, typically written between double quoted strings, such as "hello" and "héllò" .


2 Answers

Erlang and Elixir have another type of strings called IO lists, which are lists that consist of a number of binaries (Strings in elixir) and IO lists, so it's a recursive type.

Examples:

helloworld = ["Hello" | ["World"]]
sentence = ["A ", "small", " bike" | [" is", " red."]]

paragraph = [helloworld, ". ", sentence]

As you can see, we combined the two IO lists helloworld and sentence to another IO list paragraph. This happened without copying or appending any data to the actual binaries, as the new list paragraph internally just contains address locations of the other lists and the ". " string.

This allows you to cheaply combine huge lists of binaries. A lot of Elixir and Erlang functions work with IO lists, for example, Phoenix' EEx templates return IO lists which are eventually passed to the socket to send out, while still being IO lists.

Poison's encoder will also internally use IO lists so it won't have to combine the strings for every list or object it encodes.

like image 41
narrowtux Avatar answered Oct 24 '22 09:10

narrowtux


Elixir Strings (called Binaries in Erlang) are represented as a contiguous sequence of bytes in memory, not a Linked List of bytes/characters, if that's what you mean by "list". They are immutable and a naive implementation of appending two binaries will be O(n+m) if the strings are of length n and m, but the Erlang VM optimizes the use case of building a large string: if you have two strings, a and b, and a has free memory after its allocation, and you concatenate them (a <> b), the VM only copies b and reuses the old value of a. This optimization will for obvious reasons not be applied if you later concatenate another string c to a, but this optimization itself is enough to make the task of building a large binary as efficient as in languages with mutable strings. This optimization is explained in detail here.

Here's a demo of this optimization in action. In the first example, I'm creating a 10,000,000 byte string by appending to a base value. In the second example, I'm creating a 500,000 byte string by prepending to a base value, which takes over 10x more time than appending 10,000,000 bytes. In a naive implementation, both would take the same amount of time.

{time, _} = :timer.tc(fn ->
  Enum.reduce(1..10_000_000, "", fn _, acc -> acc <> "." end)
end)

IO.inspect time

{time, _} = :timer.tc(fn ->
  Enum.reduce(1..500_000, "", fn _, acc -> "." <> acc end)
end)

IO.inspect time
683621
7807815

In short, you should be fine building large string as long as you're only appending the value.


If you're going to write the resulting string to a socket or stream or similar, you might be able to do it significantly faster by creating iolists instead of a flat string. More info about those here and here.

like image 178
Dogbert Avatar answered Oct 24 '22 09:10

Dogbert