Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partitioning in clojure with a lazy collection of strings

Starting with a collection of strings like:

(def str-coll ["abcd" "efgh" "jklm"])

The goal is to extract off a specific number of characters from the head of the string collection, generating a partitioned grouping of strings. This is the desired behavior:

(use '[clojure.contrib.str-utils2 :only (join)])
(partition-all 3 (join "" str-coll))

((\a \b \c) (\d \e \f) (\g \h \j) (\k \l \m))

However, using join forces evaluation of the entire collection, which causes memory issues when dealing with very large collections of strings. My specific use case is generating subsets of strings from a lazy collection generated by parsing a large file of delimited records:

(defn file-coll [in-file]
  (->> (line-seq (reader in-file))
    (partition-by #(.startsWith ^String % ">"))
    (partition 2))))

and is building on work from this previous question. I've tried combinations of reduce, partition and join but can't come up with the right incantation to pull characters from the head of the first string and lazily evaluate subsequent strings as needed. Thanks much for any ideas or pointers.

like image 502
Brad Chapman Avatar asked Jul 27 '10 23:07

Brad Chapman


1 Answers

Not quite sure what you're going for, but the following does what your first example does, and does so lazily.

Step-by-step for clarity:

user=> (def str-coll ["abcd" "efgh" "jklm"])
#'user/str-coll
user=> (map seq str-coll)
((\a \b \c \d) (\e \f \g \h) (\j \k \l \m))
user=> (flatten *1)
(\a \b \c \d \e \f \g \h \j \k \l \m)
user=> (partition 3 *1)
((\a \b \c) (\d \e \f) (\g \h \j) (\k \l \m))

All together now:

(->> str-coll 
  (map seq)
  flatten
  (partition 3))
like image 184
Alex Taggart Avatar answered Sep 23 '22 01:09

Alex Taggart