Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Erlang: Distributed work on an array

I'm working on a project where we have an array of atoms which acts as a hash. Whenever a user connects to the server a certain value is hashed, and that hash is used as an index to lookup the element in the array, and return that element. "Outside forces" (which are handled by a long-running gen_server) are able to change this array, so I can't simply hardcode it. My problem is how to "host" this array.

My first implementation was a simple gen_server which kept a copy of the array around and sent it to whoever asked for it. The process asking for it could then traverse it and get the index they want. This implementation had and inordinate amount of memory being used, which I attributed to there being so many copies of this same array floating around.

My current implementation has a central gen_server which handles the state of this array, and children which handle the actual requests. When the state changes the central gen_server updates the children. When a process wants to find it's hash result it sends its index number to the central gen_server, which forwards the request to one of the children. The child traverses its "local" list, and sends the resulting atom back to the original process.

The problem with the current implementation is that it gets bogged down at high traffic. I've tried using more and more children, but I'm pretty sure the central gen_server is the bottleneck.

Does anyone have any ideas on a better solution to my problem?

EDIT: %s/array/list/g

like image 780
Mediocre Gopher Avatar asked Feb 01 '12 22:02

Mediocre Gopher


2 Answers

I suggest that you use ETS Tables.I think that the Array method is not efficient enough. With an ETS Table, created as public within the application backend, any process can lookup an item as soon as it needs it. ETS Tables in the current newer versions of erlang have the capability for concurrent access.

%% Lets create a record structure 
%% where by the key will be a value
%% in the array.
%% For now, i do not know what to 
%% put in the field: 'other'
-record(element,{key,other}).
create_table(TableName)-> Options = [ named_table,set, public, {keypos,2}, %% coz we are using record NOT tuple {write_concurrency,true} ], case ets:new(TableName,Options) of TableName -> {success,true}; Error -> {error,Error} end.
lookup_by_hash(TableName,HashValue)-> try ets:lookup(TableName,HashValue) of Value -> {value,Value}; catch X:Y -> {error,{X,Y}} end.
With this kind of arrangement, you will avoid A Single Point of Failure arising from a single gen_server holding data. This data is needed by many processes and hence should not be held by a single process. That's where a Table accessible by any process at any time as soon as it needs to make a look up.

The Values in the Array should be converted to records of the form as element and then inserted in the ETS Tables.

Advantages of this approach

1. We can create as many ETS Tables as possible
2. An ETS Table can handle many more elements than a data structure such as a list or an Array with much lower comparable memory consumption.
3. ETS Tables can be concurrently accessed by any process within reach and hence you will not need a central process or server to handle data
4. A single process or gen_server holding this data, means that if its compromised (goes down due to a full mail box), it will be unavailable, hence the processes which need the array will have to wait for this one server to either restart or i dont know....
5. Accessing the Array data by sending request messages plus making copies of the same array to each process that needs it is not "Erlangic" design.
6. Finally, ETS Tables ownership can be transferred from process to process. When the owning process is crashing (Only gen_servers can detect that they are dying [take note of this]), it can transfer the ETS Table to another process to take over. Check here: ETS Give Away

That's my thinking.
like image 134
Muzaaya Joshua Avatar answered Nov 15 '22 08:11

Muzaaya Joshua


Not sure if this helps, but could you manage the central hash value in a distributed hash-table (independent from your hash business) just as any other value? That way multiple process can take the load instead of one central process.

From what I read, the array does not seem to really need to be an array.

like image 35
Enno Shioji Avatar answered Nov 15 '22 08:11

Enno Shioji