Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compact Clojure code for regular expression matches and their position in string

Tags:

regex

clojure

Stuart Halloway gives the example

(re-seq #"\w+" "The quick brown fox")

as the natural method for finding matches of regex matches in Clojure. In his book this construction is contrasted with iteration over a matcher. If all one cared about were a list of matches this would be great. However, what if I wanted matches and their position within the string? Is there a better way of doing this that allows me to leverage the existing functionality in java.util.regex with resorting to something like a sequence comprehension over each index in the original string? In other words, one would like to type something like

(re-seq-map #"[0-9]+" "3a1b2c1d")

which would return a map with keys as the position and values as the matches, e.g.

{0 "3", 2 "1", 4 "2", 6 "1"}

Is there some implementation of this in an extant library already or shall I write it (shouldn't be too may lines of code)?

like image 230
Gabriel Mitchell Avatar asked Jul 16 '10 05:07

Gabriel Mitchell


2 Answers

You can fetch the data you want out of a java.util.regex.Matcher object.

user> (defn re-pos [re s]
        (loop [m (re-matcher re s)
               res {}]
          (if (.find m)
            (recur m (assoc res (.start m) (.group m)))
            res)))
#'user/re-pos
user> (re-pos #"\w+" "The quick brown fox")
{16 "fox", 10 "brown", 4 "quick", 0 "The"}
user> (re-pos #"[0-9]+" "3a1b2c1d")
{6 "1", 4 "2", 2 "1", 0 "3"}
like image 141
Brian Carper Avatar answered Nov 17 '22 06:11

Brian Carper


You can apply any function to the java.util.regex.Matcher object and return its results (simmilar to Brian's solution, but without explicit loop):

user=> (defn re-fun
         [re s fun]
         (let [matcher (re-matcher re s)]
           (take-while some? (repeatedly #(if (.find matcher) (fun matcher) nil)))))
#'user/re-fun

user=> (defn fun1 [m] (vector (.start m) (.end m)))
#'user/fun1

user=> (re-fun #"[0-9]+" "3a1b2c1d" fun1)
([0 1] [2 3] [4 5] [6 7])

user=> (defn re-seq-map
         [re s]
         (into {} (re-fun re s #(vector (.start %) (.group %)))))

user=> (re-seq-map #"[0-9]+" "3a1b2c1d")
{0 "3", 2 "1", 4 "2", 6 "1"}
like image 23
Karol Avatar answered Nov 17 '22 07:11

Karol