Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idiomatic way to use for, while still maintaining high performance

I have a map that is sorted by its keys which contains data like this:

    (def h {50 Text1
    70 Text2
    372 Text1
    391 Text2
    759 Text1
    778 Text2
    })

The map is sorted by Keys. The key (the number) can be interpreted as the position where the corresponding value was found in a large block of text. In the above example, "Text1" was found at position 50 in the text.

Now, I want to find all Texts that were found within k positions of each other. I define a function a like this:

     (defn nearest [m k]
         (for [m1 (keys m) m2 (keys m)
              :when (and (> m2 m1) (not= (m m1) (m m2)) (< (- m2 m1) k))]
              [m1 (get m m1)  m2 (get m m2)]))

     (nearest h 50)
     ; [[50 "Text1" 70 "Text2"] [372 "Text1" 391 "Text2"] [759 "Text1" 778 "Text2"]]

This works, but is too slow when the map m has 100s of thousands of elements. Because the for loop actually looks at all pairs of elements in the map. Since the map is sorted, for each element in the map, it is not necessary to check further elements, once the next element is already beyond k characters. I was able to write a version using loop and recur. But it is kind of unreadable. Is there a more natural way to do this using for? I am assuming for (:while ) should do the trick, but was not able to find a way.

(defn nearest-quick [m k]
      (let [m1 (keys m) m2 (keys m)]
        (loop [inp m res []  i (first m1) m1 (rest m1) j (first m2) m2 (rest m2)]
          (cond
            (nil? i) res
            (nil? j)(recur inp res (first m1) (rest m1) j m2)
            (= i j) (recur inp res i m1 (first m2) (rest m2))
            (< j i) (recur inp res i m1 (first m2) (rest m2))
            (= (inp i) (inp j)) (recur inp res i m1 (first m2) (rest m2))
            (< (- j i) k) (recur inp (conj res [i (inp i) j (inp j)]) i m1 (first m2) (rest m2))
            (>= (- j i) k) (recur inp res (first m1) (rest m1) (first (rest m1)) (rest (rest m1)))))))

Note: with a map with 42K elements, the first version takes 90 mins and the second version takes 3 mins.

like image 335
TIM S Avatar asked Feb 17 '23 17:02

TIM S


1 Answers

One could probably exploit subseq when the map is a sorted-map.

(defn nearest
  [m n]
  (for [[k v]   m
        [nk nv] (subseq m < k < (+ k n))
        :when (not= v nv)]
    [k v nk nv]))

Code not benchmarked.

like image 180
kotarak Avatar answered Apr 28 '23 01:04

kotarak