Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing a file character by character in Clojure

Tags:

clojure

I'm working on writing a function in Clojure that will process a file character by character. I know that Java's BufferedReader class has the read() method that reads one character, but I'm new to Clojure and not sure how to use it. Currently, I'm just trying to do the file line-by-line, and then print each character.

(defn process_file [file_path]
(with-open [reader (BufferedReader. (FileReader. file_path))]
    (let [seq (line-seq reader)]
        (doseq [item seq]
            (let [words (split item #"\s")]
                (println words))))))

Given a file with this text input:

International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff.

My output looks like this:

[International donations are gratefully accepted, but we cannot make]
[any statements concerning tax treatment of donations received from]
[outside the United States.  U.S. laws alone swamp our small staff.]

Though I would expect it to look like:

["international" "donations" "are" .... ]

So my question is, how can I convert the function above to read character by character? Or even, how to make it work as I expect it to? Also, any tips for making my Clojure code better would be greatly appreciated.

like image 544
Jack Slingerland Avatar asked Jul 26 '12 12:07

Jack Slingerland


1 Answers

(with-open [reader (clojure.java.io/reader "path/to/file")] ...

I prefer this way to get a reader in clojure. And, by character by character, do you mean in file access level, like read, which allow you control how many bytes to read?

Edit

As @deterb pointed out, let's check the source code of line-seq

(defn line-seq
  "Returns the lines of text from rdr as a lazy sequence of strings.
   rdr must implement java.io.BufferedReader."
  {:added "1.0"
   :static true}
  [^java.io.BufferedReader rdr]
  (when-let [line (.readLine rdr)]
    (cons line (lazy-seq (line-seq rdr)))))

I faked a char-seq

 (defn char-seq 
   [^java.io.Reader rdr]
   (let [chr (.read rdr)]
     (if (>= chr 0)
     (cons chr (lazy-seq (char-seq rdr))))))

I know this char-seq reads all chars into memory[1], but I think it shows that you can directly call .read on BufferedReader. So, you can write your code like this:

(let [chr (.read rdr)]
  (if (>= chr 0)
    ;do your work here
  ))

How do you think?

[1] According to @dimagog's comment, char-seq not read all char into memory thanks to lazy-seq

like image 186
xiaowl Avatar answered Sep 18 '22 03:09

xiaowl