Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How heavy are erlang events vs threads?

Not sure if this is advisible but i am reading up on erlang and i was taking a look at gen_event and i was wondering what is the overhead in using it for a full event oriented programming like i would use in Node.Js for example.

What is the overhead in having an event handle a task vs spawning a new thread in erlang to do the same task.

Thanks.

like image 764
BRampersad Avatar asked Feb 23 '23 19:02

BRampersad


1 Answers

Erlang the language doesn't expose threads, it gives you Erlang processes. These processes are scheduled by the Erlang runtime efficiently onto OS threads that are typically mapped onto CPU cores. They're light-weight (less than 4kb of memory footprint on the 32bit VM including initial heap) and are scheduled pre-emptively so that blocking or large CPU consumption in any of them doesn't deny any other process a fair share of CPU time.

So don't be afraid to spawn a process to handle each of the requests you want to service in your system - it's a good initial design and usually gives you good throughput via parallelism and tends to scale to more cores/cpus/nodes more easily.

An additional benefit is that the code in each process can be written in a straight-forward procedural manner:

%% Ask a server to perform a request and await the response from the worker.
request(Server, R) ->
    Server ! {new_request, R, self()},
    receive {response, Response} -> Response end.

%% Create a server.
start() ->
    spawn(?MODULE, server, []).

%% The server code
server() ->
    receive
        {new_request, R, Sender} ->
            %% Spawn a process to handle this request
            spawn(?MODULE, process_request, [R, Sender]),
        server()
    end.

%% The worker code
process_request(R, Sender) ->
    A = do_io(),
    B = do_cpu_bound_thing(A),
    C = do_io(C),
    Sender ! {response, C}. % Return the response to the sender
    %% Process shuts down cleanly here as there's nothing more to do. 

Here we have two kinds of process, a single central server process that accepts new requests, and any number of worker processes that actually do the work. Errors in individual requests do not affect the server process or other worker processes, individual worker processes can run at different rates depending on IO and CPU resources.

From here it's easy to add supervision of worker processes so that we can restart individual requests that fail, multi-machine distributed processing by adding a 'Node' argument to the spawn call to create workers, timeouts so that clients making requests don't block forever if the server is overloaded or a worker process fails, and so on.

You cannot get the parallelism the above code is capable of by using multiple handlers inside a gen_event process. The gen_event code would be more convoluted to read and you would have to interleave requests yourself instead of letting the runtime do it for you.


tl;dr: the overhead is so low and the other benefits so great that you should usually (almost always) spawn a process rather than try and do multiple things at once in a gen_event process.

like image 57
archaelus Avatar answered Mar 08 '23 14:03

archaelus