Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort faster in racket using hash table

So I have an example list of elements like this

(define A (list 'a 'c 'd 'e 'f 'e 'a))

Now I want to make a ranking from this sample

(define (scan lst)
    (foldl (lambda (element a-hash) (hash-update a-hash element add1 0))
           (hash)
           lst))

The result should be like this:

> #(('a . 2) ('f . 1) ('e . 2) ....)

Because `scan function will make a hash table containing unique keys and the number of repetitions of that key (if it catches an unindexed key it will create a new place for that new key, counting from 0).

Then I'd like to sort that hash-table because it's unsorted:

(define (rank A)
     (define ranking (scan A))         
     (sort ranking > #:key cdr)))

So the result would look like this:

#(('a . 2) ('e . 2) ('f . 1) ...)

Now I'd like to truncate the hash-table and throw away the bottom at the threshold of n = 1 (aka only take the elements with more than 2 repetitions).

(define (truncate lst n)
    (define l (length lst))
    (define how-many-to-take 
        (for/list
             ([i l]
               #:when (> (cdr (list-ref lst i))
                          n))
               i))
    (take lst (length how-many-to-take)))

So the result might look like this:

(('a . 2) ('e . 2))

However, at the big scale, this procedure is not very efficient, it takes too long. Would you have any suggestion to improve the performance?

Thank you very much,

Part 2:

I have this data structure:

(automaton x 
   (vector (state y (vector a b c))  
           (state y (vector a b c)) ...)) 

Then i generate randomly a population of 1000 of them. Then i scan and rank them using the above functions. If i just scan them as is, it already takes long time. If i try to flatten them into a list like this

(list x y a b c y a b c...)

it'd take even more time. Here is the flatten function:

(define (flatten-au au)
  (match-define (automaton x states) au)
  (define l (vector-length states))
  (define body
    (for/list ([i (in-range l)])
      (match-define (state y z) (vector-ref states i))
      (list y (vector->list z))))
  (flatten (list x body)))

The scan function will look a bit different:

(define (scan population)
    (foldl (lambda (auto a-hash) (hash-update a-hash (flatten-automaton auto) add1 0))
           (hash)
           population))
like image 678
Linh Chi Nguyen Avatar asked Dec 12 '15 18:12

Linh Chi Nguyen


1 Answers

Yep, I believe I see the problem. Your algorithm has O(n^2) ("n-squared") running time. This is because you're counting from one to the length of the list, then for each index, performing a list-ref, which takes time proportional to the size of the index.

This is super-easy to fix.

In fact, there's really no reason to sort it or convert it to a list if this is what you want; just filter the hash table directly. Like this...

#lang racket

(define A (build-list 1000000 (λ (idx) (random 50))))

(define (scan lst)
    (foldl (lambda (element a-hash) (hash-update a-hash element add1 0))
           (hash)
           lst))

(define ht (scan A))

(define only-repeated
  (time
   (for/hash ([(k v) (in-hash ht)]
              #:when (< 1 v))
     (values k v))))

I added the call to time to see how long it takes. For a list of size one million, on my computer this takes a measured time of 1 millisecond.

Asymptotic complexity is important!

like image 108
John Clements Avatar answered Sep 28 '22 06:09

John Clements