I was under the impression that the lazy seqs were always chunked.
=> (take 1 (map #(do (print \.) %) (range)))
(................................0)
As expected 32 dots are printed because the lazy seq returned by range
is chunked into 32 element chunks. However, when instead of range
I try this with my own function get-rss-feeds
, the lazy seq is no longer chunked:
=> (take 1 (map #(do (print \.) %) (get-rss-feeds r)))
(."http://wholehealthsource.blogspot.com/feeds/posts/default")
Only one dot is printed, so I guess the lazy-seq returned by get-rss-feeds
is not chunked. Indeed:
=> (chunked-seq? (seq (range)))
true
=> (chunked-seq? (seq (get-rss-feeds r)))
false
Here is the source for get-rss-feeds
:
(defn get-rss-feeds
"returns a lazy seq of urls of all feeds; takes an html-resource from the enlive library"
[hr]
(map #(:href (:attrs %))
(filter #(rss-feed? (:type (:attrs %))) (html/select hr [:link])))
So it appears that chunkiness depends on how the lazy seq is produced. I peeked at the source for the function range
and there are hints of it being implemented in a "chunky" manner. So I'm a bit confused as to how this works. Can someone please clarify?
Here's why I need to know.
I have to following code: (get-rss-entry (get-rss-feeds h-res) url)
The call to get-rss-feeds
returns a lazy sequence of URLs of feeds that I need to examine.
The call to get-rss-entry
looks for a particular entry (whose :link field matches the second argument of get-rss-entry). It examines the lazy sequence returned by get-rss-feeds
. Evaluating each item requires an http request across the network to fetch a new rss feed. To minimize the number of http requests it's important to examine the sequence one-by-one and stop as soon as there is a match.
Here is the code:
(defn get-rss-entry
[feeds url]
(ffirst (drop-while empty? (map #(entry-with-url % url) feeds))))
entry-with-url
returns a lazy sequence of matches or an empty sequence if there is no match.
I tested this and it seems to work correctly (evaluating one feed url at a time). But I am worried that somewhere, somehow it will start behaving in a "chunky" way and it will start evaluating 32 feeds at a time. I know there is a way to avoid chunky behavior as discussed here, but it doesn't seem to even be required in this case.
Am I using lazy seq non-idiomatically? Would loop/recur be a better option?
Overview. Clojure is not a lazy language. However, Clojure supports lazily evaluated sequences. This means that sequence elements are not available ahead of time and produced as the result of a computation.
Lazy sequences are regular sequences where each item is computed on demand rather than up front. For example, consider this array of numbers: let numbers = Array(1... 100000) That will hold 100,000 numbers.
You are right to be concerned. Your get-rss-entry
will indeed call entry-with-url
more than strictly necessary if the feeds
parameter is a collection that returns chunked seqs. For example if feeds
is a vector, map
will operate on whole chunks at a time.
This problem is addressed directly in Fogus' Joy of Clojure, with the function seq1
defined in chapter 12:
(defn seq1 [s]
(lazy-seq
(when-let [[x] (seq s)]
(cons x (seq1 (rest s))))))
You could use this right where you know you want the most laziness possible, right before you call entry-with-url
:
(defn get-rss-entry [feeds url] (ffirst (drop-while empty? (map #(entry-with-url % url) (seq1 feeds)))))
Lazy seqs are not always chunked - it depends on how they are produced.
For example, the lazy seq produced by this function is not chunked:
(defn integers-from [n]
(lazy-seq (cons n (do (print \.) (integers-from (inc n))))))
(take 3 (integers-from 3))
=> (..3 .4 5)
But many other clojure built-in functions do produce chunked seqs for performance reasons (e.g. range)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With