I need to read large file (~1GB), process it and save to db. My solution looks like that:
data.txt
format: [id],[title]\n
1,Foo
2,Bar
...
code
(ns test.core
  (:require [clojure.java.io :as io]
            [clojure.string :refer [split]]))
(defn parse-line
  [line]
  (let [values (split line #",")]
    (zipmap [:id :title] values)))
(defn run
  []
  (with-open [reader (io/reader "~/data.txt")]
    (insert-batch (map parse-line (line-seq reader)))))
; insert-batch just save vector of records into database
But this code does not work well, because it first parse all lines and then send them into database.
I think the ideal solution would be read line -> parse line -> collect 1000 parsed lines -> batch insert them into database -> repeat until there is no lines. Unfortunately, I have no idea how to implement this.
One suggestion:
Use line-seq to get a lazy sequence of lines,
use map to parse each line,
(so far this matches what you are doing)
use partition-all to partition your lazy sequence of parsed lines into batches, and then
use insert-batch with doseq to write each batch to the database.
And an example:
(->> (line-seq reader)
     (map parse-line)
     (partition-all 1000)
     (#(doseq [batch %] 
       (insert-batch batch))))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With