Starting with a collection of strings like:
(def str-coll ["abcd" "efgh" "jklm"])
The goal is to extract off a specific number of characters from the head of the string collection, generating a partitioned grouping of strings. This is the desired behavior:
(use '[clojure.contrib.str-utils2 :only (join)])
(partition-all 3 (join "" str-coll))
((\a \b \c) (\d \e \f) (\g \h \j) (\k \l \m))
However, using join forces evaluation of the entire collection, which causes memory issues when dealing with very large collections of strings. My specific use case is generating subsets of strings from a lazy collection generated by parsing a large file of delimited records:
(defn file-coll [in-file]
(->> (line-seq (reader in-file))
(partition-by #(.startsWith ^String % ">"))
(partition 2))))
and is building on work from this previous question. I've tried combinations of reduce, partition and join but can't come up with the right incantation to pull characters from the head of the first string and lazily evaluate subsequent strings as needed. Thanks much for any ideas or pointers.
Not quite sure what you're going for, but the following does what your first example does, and does so lazily.
Step-by-step for clarity:
user=> (def str-coll ["abcd" "efgh" "jklm"]) #'user/str-coll user=> (map seq str-coll) ((\a \b \c \d) (\e \f \g \h) (\j \k \l \m)) user=> (flatten *1) (\a \b \c \d \e \f \g \h \j \k \l \m) user=> (partition 3 *1) ((\a \b \c) (\d \e \f) (\g \h \j) (\k \l \m))
All together now:
(->> str-coll (map seq) flatten (partition 3))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With