Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure: get list of regex matches

Tags:

regex

clojure

Perhaps I'm going about this all wrong, but I'm trying to get all the matches in a string for a particular regex pattern. I'm using re-matcher to get a Match object, which I pass to re-find, giving me (full-string-match, grouped-text) pairs. How would I get a sequence of all the matches produced by the Match object?

In Clojuresque Python, it would look like:

pairs = []
match = re-matcher(regex, line)

while True:
    pair = re-find(match)
    if not pair: break
    pairs.append(pair)

Any suggestions?

like image 906
exupero Avatar asked Oct 18 '10 20:10

exupero


1 Answers

You probably want to use the built in re-seq and Clojure's built in regex literal. Don't mess with underlying java objects unless you really have too.

(doc re-seq)


clojure.core/re-seq
([re s])
  Returns a lazy sequence of successive matches of pattern in string,
  using java.util.regex.Matcher.find(), each such match processed with
  re-groups. 

For example:

user> (re-seq #"the \w+" "the cat sat on the mat")
("the cat" "the mat")

In answer to the follow-up comment, group captures will result in a vector of strings with an element for each part of the group in a match:

user> (re-seq #"the (\w+(t))" "the cat sat on the mat")
(["the cat" "cat" "t"] ["the mat" "mat" "t"])

You can extract a specific element by taking advantage of the elegant fact that vectors are functions of their indices.

user> (defn extract-group [n] (fn [group] (group n)))
#'user/extract-group
user> (let [matches (re-seq #"the (\w+(t))" "the cat sat on the mat")]
       (map (extract-group 1) matches))
("cat" "mat")

Or you can destructure the matches (here using a for macro to go over all the matches but this could also be done in a let or function argument binding):

user> (dorun 
        (for [[m1 m2 m3] (re-seq #"the (\w+(t))" "the cat sat on the mat")]  
          (do (println "m1:" m1) 
              (println "m2:" m2) 
              (println "m3:" m3))))
m1: the cat
m2: cat
m3: t
m1: the mat
m2: mat
m3: t
like image 88
Alex Stoddard Avatar answered Oct 21 '22 13:10

Alex Stoddard