Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Erlang OTP application design

I am struggling a little coming to grips with the OTP development model as I convert some code into an OTP app.

I am essentially making a web crawler and I just don't quite know where to put the code that does the actual work.

I have a supervisor which starts my worker:

-behaviour(supervisor).
-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).

init(_Args) ->          
  Children = [
    ?CHILD(crawler, worker)
  ],  
  RestartStrategy = {one_for_one, 0, 1},
  {ok, {RestartStrategy, Children}}.

In this design, the Crawler Worker is then responsible for doing the actual work:

-behaviour(gen_server).

start_link() ->
  gen_server:start_link(?MODULE, [], []).

init([]) ->
  inets:start(),        
  httpc:set_options([{verbose_mode,true}]), 
  % gen_server:cast(?MODULE, crawl),
  % ok = do_crawl(),
  {ok, #state{}}.

do_crawl() ->
  % crawl!
  ok.

handle_cast(crawl}, State) -> 
  ok = do_crawl(),
  {noreply, State};

do_crawl spawns a fairly large number of processes and requests that handle the work of crawling via http.

Question, ultimately is: where should the actual crawl happen? As can be seen above I have been experimenting with different ways of triggering the actual work, but still missing some concept essential for grokering the way things fit together.

Note: some of the OTP plumbing is left out for brevity - the plumbing is all there and the system all hangs together

like image 896
Toby Hede Avatar asked Mar 11 '11 22:03

Toby Hede


People also ask

What is Erlang application?

Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability. Some of its uses are in telecoms, banking, e-commerce, computer telephony and instant messaging.

What is a Behaviour in Erlang?

Behaviours are formalizations of these common patterns. The idea is to divide the code for a process in a generic part (a behaviour module) and a specific part (a callback module). The behaviour module is part of Erlang/OTP.

How do I stop Erlang application?

Well if you're in the shell of the node you want to turn down, even if it's not receiving input, you can still press Ctrl-g (which takes you to JCL mode). Once there you can use the command q to quit the Erlang shell. This is similar in effect to erlang:halt(0).


1 Answers

I apologize if I got your question wrong.

A couple of suggestions that I can make to guide you in a right direction (or what I consider being a right direction :)

1 (Rather minor, but still important) I suggest getting inets startup code out of that worker and putting it in application statup code (appname_app.erl). As far as I can tell you're using rebar templates, so you should have those.

2 Now, onto essential parts. In order to make a full use of OTP's supervisor model, assuming that you want to spawn a large a large number of crawlers, it would make a lot of sense to use simple_one_for_one supervisors instead of one_for_one (read http://www.erlang.org/doc/man/supervisor.html for more details, but essential part is: simple_one_for_one - a simplified one_for_one supervisor, where all child processes are dynamically added instances of the same process type, i.e. running the same code.). So instead of launching just one process to supervise, you will actually specify a "template" of a sort — on how to start worker processes that are doing real job. Every worker of that kind is started using supervisor:start_child/2 — http://erldocs.com/R14B01/stdlib/supervisor.html?i=1&search=start_chi#start_child/2. None of those workers will start until you explicitly start them.

2.1 Depending on a nature of your crawlers, you might need to assess what kind of restart strategy you need for your workers. Right now in your template you have it set as permanent (however you have a different kind of supervised child). Here are your options:

 Restart defines when a terminated child process should be restarted. A permanent child process should always be restarted, 
 a temporary child process should never be restarted and a transient child process should be restarted only if it terminates 
 abnormally, i.e. with another exit reason than normal.

So, you might want to have something like:

 -behaviour(supervisor).
 -define(CHILD(I, Type, Restart), {I, {I, start_link, []}, Restart, 5000, Type, [I]}).

 init(_Args) ->          
     Children = [
          ?CHILD(crawler, worker, transient)
     ],  
     RestartStrategy = {simple_one_for_one, 0, 1},
    {ok, {RestartStrategy, Children}}.

I took a liberty of suggesting transient restarts for these children as it makes sense for this kind of workers (restart if they failed to do the job and don't if they completed normally)

2.2 Once you take care of the above items, your supervisor will be handling any number of dynamically added worker processes; and it will be monitoring and restarting (if necessary) each of them, which adds a great deal to your system stability and manageability.

3 Now, a worker process. I would assume that each crawler has some particular states which it might be in at any given moment. For that reason, I would suggest using gen_fsm (finite state machine, more about them available at http://learnyousomeerlang.com/finite-state-machines). This way, each gen_fsm instance you dynamically add to your supervisor, should send an event to itself in init/1 (using http://erldocs.com/R14B01/stdlib/gen_fsm.html?i=0&search=send_even#send_event/2).

Something alone the lines of:

   init([Arg1]) ->
       gen_fsm:send_event(self(), start),
       {ok, initialized, #state{ arg1 = Arg }}.

   initialized(start, State) ->
       %% do your work
       %% and then either switch to next state {next_state, ...
       %% or stop the thing: {stop, ...

Note that doing your work could be either contained within this gen_fsm process or you might consider spawning a separate process for it, depending on your particular needs.

You might want to have multiple state names for different phases of your crawling if it deems to be necessary.

Either way, hope this will help designing your application in a somewhat OTP-ish way. Please let me know if you have any questions, I'll be happy to add something if necessary.

like image 161
Yurii Rashkovskii Avatar answered Oct 25 '22 06:10

Yurii Rashkovskii