I need to read large file (~1GB), process it and save to db. My solution looks like that:
data.txt
format: [id],[title]\n
1,Foo
2,Bar
...
code
(ns test.core
(:require [clojure.java.io :as io]
[clojure.string :refer [split]]))
(defn parse-line
[line]
(let [values (split line #",")]
(zipmap [:id :title] values)))
(defn run
[]
(with-open [reader (io/reader "~/data.txt")]
(insert-batch (map parse-line (line-seq reader)))))
; insert-batch just save vector of records into database
But this code does not work well, because it first parse all lines and then send them into database.
I think the ideal solution would be read line -> parse line -> collect 1000 parsed lines -> batch insert them into database -> repeat until there is no lines
. Unfortunately, I have no idea how to implement this.
One suggestion:
Use line-seq to get a lazy sequence of lines,
use map to parse each line,
(so far this matches what you are doing)
use partition-all to partition your lazy sequence of parsed lines into batches, and then
use insert-batch with doseq to write each batch to the database.
And an example:
(->> (line-seq reader)
(map parse-line)
(partition-all 1000)
(#(doseq [batch %]
(insert-batch batch))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With