Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieve keys from hash-table, sorted by the values, efficiently

Tags:

emacs

lisp

elisp

I'm using Emacs Lisp, but have the cl package loaded, for some common lisp features.

I have a hash table containing up to 50K entries, with integer keys mapped to triplets, something like this (but in actual lisp):

{
  8   => '(9  300 12)
  27  => '(5  125 9)
  100 => '(10 242 14)
}

The second value in the triplet is a score that has been calculated during a complex algorithm that built the hash-table. I need to collect a regular lisp list with all of the keys from the hash, ordered by the score (i.e. all keys ordered by the cadr of the value).

So for the above, I need this list:

'(27 100 8)

I'm currently doing this with two phases, which feels less efficient than it needs to be.

Is there a good way to do this?

My current solution uses maphash to collect the keys and the values into two new lists, then does a sort in the normal way, referring to the list of scores in the predicate. It feels like I could be combining the collection and the sorting together, however.

EDIT | I'm also not attached to using hash-table, though I do need constant access time for the integer keys, which are not linearly spaced.

EDIT 2| It looks like implementing a binary tree sort could work, where the labels in the tree are the scores and the values are the keys... this way I'm doing the sort as I map over the hash.

... Continues reading wikipedia page on sorting algorithms

like image 362
d11wtq Avatar asked Jun 12 '13 12:06

d11wtq


1 Answers

Basically, you solution is correct: you need to collect the keys into a list:

(defun hash-table-keys (hash-table)
  (let ((keys ()))
    (maphash (lambda (k v) (push k keys)) hash-table)
    keys))

and then sort the list:

(sort (hash-table-keys hash-table)
      (lambda (k1 k2)
        (< (second (gethash k1 hash-table))
           (second (gethash k2 hash-table)))))

Combining key collection with sorting is possible: you need to collect the keys into a tree and then "flatten" the tree. However, this will only matter if you are dealing with really huge tables. Also, since Emacs Lisp compiles to bytecodes, you might find that using the sort built-in is still faster than using a tree. Consider also the development cost - you will need to write code whose value will be mostly educational.

Delving deeper, collecting the keys allocates the list of keys (which you will certainly need anyway for the result) and sort operates "in-place", so the "simple way" is about as good as it gets.

The "tree" way will allocate the tree (the same memory footprint as the required list of keys) and populating and flattening it will be the same O(n*log(n)) process as the "collect+sort" way. However, keeping the tree balanced, and then flattening it "in-place" (i.e., without allocating a new list) is not a simple exercise.

The bottom line is: KISS.

like image 196
sds Avatar answered Nov 15 '22 10:11

sds