Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiomatic way of keeping a stateful lookup table with indexes in Clojure

I am fairly new to Clojure and functional programming in general and I've been struggling with the following problem. I'd like to assign a unique and stable index to a series of tokens (strings). Since there will be a lot more lookups than insertions, a hash-map seemed to be the way to go.

In Java I would've written something along the lines of

int last = 0; 
HashMap<String, Integer> lut = new HashMap<String, Integer>();

function Integer getIndex(String token) {
   Integer index = lut.get(token); 
   if(index == null) 
      last++;
      lut.put(token, last);
      return last;
    else { 
      return index;
    }
}

The transliterated version in Clojure would be something like

(def last-index (atom 0))
(def lookup-table (atom {}))

(defn get-index [token]
  (if (nil? (get @lookup-table token))
    (do 
      (swap! last-index inc)
      (swap! lookup-table assoc token @last-index)
      @last-index)
    (get @lookup-table token)))

But this doesn't seem to be very idomatic since it basically side-effects and doesn´t even hide it.

So how would you do this without having the two atoms for keeping state?

like image 436
JoelKuiper Avatar asked Nov 16 '12 11:11

JoelKuiper


1 Answers

The answer given by Ankur is not thread safe, although I don't think seh's description of why is very helpful, and his alternatives are worse. It's reasonable to say "Well I'm not worried about multiple threads now", in which case that answer is fine. But it's valuable to be able to write such things safely even if you don't need that guarantee in any particular instance, and the only safe way is to do all your logic inside the swap!, like so:

(let [m (atom {})]
  (defn get-index [token]
    (get (swap! m
                #(assoc % token (or (% token) (count %))))
         token)))

You can speed this up a bit by avoiding a swap! if there is already an entry when the function is called, and by avoiding an assoc if there is already an entry once you've entered the swap!, but you must "double check" that the map doesn't have an entry for the current token before just assigning it (count %), because some other thread may have snuck in before you started swap!ing (but after you decided to swap!), and assigned a value for the current token, in which case you must respect that assignment instead of making a new one.

Edit: as an aside, the Java version of course has the same thread-safety problem, because by default everything in Java is mutable and not thread-safe. At least in Clojure you have to put a ! in there, saying "Yes, I know this is dangerous, I know what I'm doing."

So in some sense Ankur's solution is a perfect translation of the Java code, but even better would be to improve it!

like image 77
amalloy Avatar answered Oct 23 '22 13:10

amalloy