I am writing my first Clojure program.
I am using clojure.data.csv to process a csv file. My file is potentially large and so I do want to exploit laziness. My MWE code to demonstrate my issue is shown below.
When I execute the load-data function, I get "IOException Stream closed" and so it is clear to me that the lazy stream is being closed before the point of consumption.
I have looked over the documentation for data.csv (https://github.com/clojure/data.csv) and can see that one way to prevent the stream from being closed before consumption is to move the stream opening to the callstack where the stream is consumed. As far as I understand it, this is what I have done below since (take 5) is within the confines of with-open. Clearly, I have a conceptual gap. Deeply appreciate any help!
(ns data-load.core
(:gen-class)
(:require [clojure.data.csv :as csv]
[clojure.java.io :as io]))
(defn load-data [from to]
(with-open [reader (io/reader from)
writer (io/writer to)]
(->> (csv/read-csv reader)
(take 5))))
As you said, what you're returning from load-data
is a lazy sequence, which by the time it's consumed you've already left the scope of with-open
. You just need to force the realization of the lazy sequence before returning it.
As far as I understand it, this is what I have done below since
(take 5)
is within the confines ofwith-open
.
It is within the scope, but take
also returns a lazy sequence! It has only wrapped a lazy sequence in another that won't be realized until after with-open
scope. From the clojure.data.csv examples:
(defn sum-second-column [filename]
(with-open [reader (io/reader filename)]
(->> (read-column reader 1)
(drop 1)
(map #(Double/parseDouble %))
(reduce + 0)))) ;; this is the only non-lazy operation
The important observation here is that the final operation is reduce
which is going to consume the lazy sequence. If you took reduce
out and tried to consume the produced sequence from outside the function, you'd get the same "stream closed" exception.
One way to do this is to just turn the sequence into a vector with vec
, or use doall
which will also force it to be realized:
(defn load-data [from]
(with-open [reader (io/reader from)]
(->> (csv/read-csv reader)
(take 5)
;; other intermediate steps go here
(doall))))
My file is potentially large and so I do want to exploit laziness.
You'll need a way to do all your work before the stream is closed, so you could supply a function to your load-data
function to perform on each row of the CSV:
(defn load-data [from f]
(with-open [reader (io/reader from)]
(doall (map f (csv/read-csv reader)))))
For example, concatenate the row values into strings:
(load-data (io/resource "input.txt")
(partial apply str))
=> ("abc" "efg")
If you want a lazy solution then check out https://stackoverflow.com/a/13312151/954570 (all the credits go to the original authors https://stackoverflow.com/users/181772/andrew-cooke and https://stackoverflow.com/users/611752/johnj).
The idea is to manage reader open/close manually and keep the reader open until the sequence is exhausted. It comes with its own quirks but worked well for me (I needed to merge/process data from multiple1 large files that won't fit in memory).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With