Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most performing erlang module to store a large list of key/Term-Values under a Term

Tags:

erlang

With focus on read performance, I want to create a Term such as an Orddict or Proplist that contains a large number (100,000s) entries, each containing an ID and a Term value. This encapsulating Term should be able to return the a value stored under its key, just like an Orddict is able to do.

example:

 K001 - Term001
 K002 - Term002
 K003 - Term003

The resulting Term containing the whole set needs to be passed from function to function, for several computing purposes without storing it on a persistence store to avoid disk I/O. I also chose not to use memory caching at this stage to avoid architectural complexity at this moment, therefore my focus is to have all of this to be simply key-searcheable.

Orddicts are key-sorted, which enhance the seek of a key, compared to a normal Dict. I am not aware of any other Erlang Module that can embed a more efficient indexing mechanism within its Term.

Any suggestions for an approach better than an Orddict ?

like image 547
gextra Avatar asked Mar 16 '13 13:03

gextra


2 Answers

Actually, orddict is implemented as a sorted list (source), so it performs poorly both for insertion and lookup, especially when the keys are inserted in ascending order. Stay away from it; it won't work for your use case. dict is a hash-based data structure and offers solid insert/lookup performance. If the order of keys is important to you, consider using a tree-based map (such as gb_trees) as you can extract an ordered key sequence by taking the in-order tree walk.

like image 72
Martin Törnwall Avatar answered Nov 16 '22 01:11

Martin Törnwall


If you want to share a large dataset between Erlang processes, you can try to use ETS. It is fast in-memory key-value store, that only supports destructive updates.

like image 26
arcusfelis Avatar answered Nov 16 '22 03:11

arcusfelis