Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Clojure, are lazy seqs always chunked?

I was under the impression that the lazy seqs were always chunked.

=> (take 1 (map #(do (print \.) %) (range)))
(................................0)

As expected 32 dots are printed because the lazy seq returned by range is chunked into 32 element chunks. However, when instead of range I try this with my own function get-rss-feeds, the lazy seq is no longer chunked:

=> (take 1 (map #(do (print \.) %) (get-rss-feeds r)))
(."http://wholehealthsource.blogspot.com/feeds/posts/default")

Only one dot is printed, so I guess the lazy-seq returned by get-rss-feeds is not chunked. Indeed:

=> (chunked-seq? (seq (range)))
true

=> (chunked-seq? (seq (get-rss-feeds r)))
false

Here is the source for get-rss-feeds:

(defn get-rss-feeds
  "returns a lazy seq of urls of all feeds; takes an html-resource from the enlive library"
  [hr]
  (map #(:href (:attrs %))
       (filter #(rss-feed? (:type (:attrs %))) (html/select hr [:link])))

So it appears that chunkiness depends on how the lazy seq is produced. I peeked at the source for the function range and there are hints of it being implemented in a "chunky" manner. So I'm a bit confused as to how this works. Can someone please clarify?


Here's why I need to know.

I have to following code: (get-rss-entry (get-rss-feeds h-res) url)

The call to get-rss-feeds returns a lazy sequence of URLs of feeds that I need to examine.

The call to get-rss-entry looks for a particular entry (whose :link field matches the second argument of get-rss-entry). It examines the lazy sequence returned by get-rss-feeds. Evaluating each item requires an http request across the network to fetch a new rss feed. To minimize the number of http requests it's important to examine the sequence one-by-one and stop as soon as there is a match.

Here is the code:

(defn get-rss-entry
  [feeds url]
  (ffirst (drop-while empty? (map #(entry-with-url % url) feeds))))

entry-with-url returns a lazy sequence of matches or an empty sequence if there is no match.

I tested this and it seems to work correctly (evaluating one feed url at a time). But I am worried that somewhere, somehow it will start behaving in a "chunky" way and it will start evaluating 32 feeds at a time. I know there is a way to avoid chunky behavior as discussed here, but it doesn't seem to even be required in this case.

Am I using lazy seq non-idiomatically? Would loop/recur be a better option?

like image 282
Geo G Avatar asked Sep 13 '12 17:09

Geo G


People also ask

What is a lazy sequence Clojure?

Overview. Clojure is not a lazy language. However, Clojure supports lazily evaluated sequences. This means that sequence elements are not available ahead of time and produced as the result of a computation.

What is a lazy sequence?

Lazy sequences are regular sequences where each item is computed on demand rather than up front. For example, consider this array of numbers: let numbers = Array(1... 100000) That will hold 100,000 numbers.


2 Answers

You are right to be concerned. Your get-rss-entry will indeed call entry-with-url more than strictly necessary if the feeds parameter is a collection that returns chunked seqs. For example if feeds is a vector, map will operate on whole chunks at a time.

This problem is addressed directly in Fogus' Joy of Clojure, with the function seq1 defined in chapter 12:

(defn seq1 [s]
  (lazy-seq
    (when-let [[x] (seq s)]
      (cons x (seq1 (rest s)))))) 

You could use this right where you know you want the most laziness possible, right before you call entry-with-url:

(defn get-rss-entry
  [feeds url]
  (ffirst (drop-while empty? (map #(entry-with-url % url) (seq1 feeds)))))
like image 116
Chouser Avatar answered Sep 22 '22 14:09

Chouser


Lazy seqs are not always chunked - it depends on how they are produced.

For example, the lazy seq produced by this function is not chunked:

(defn integers-from [n]
  (lazy-seq (cons n (do (print \.) (integers-from (inc n))))))

(take 3 (integers-from 3))
=> (..3 .4 5)

But many other clojure built-in functions do produce chunked seqs for performance reasons (e.g. range)

like image 38
mikera Avatar answered Sep 18 '22 14:09

mikera