I am trying to insert records into postgres DB, and its taking about 3 hours while it takes 40seconds using python psycopg2 and cursor.copy_from method
What is wrong with my code, using clojure.java.jdbc/db-do-prepared also takes about 3 hours too. Please help!
File size is 175M and it has 409,854 records
(defn-
str<->int [str]
(let [n (read-string str)]
(if (integer? n) n)))
(with-open [file (reader "/path/to/foo.txt")]
(try
(doseq [v (clojure-csv.core/parse-csv file)]
(clojure.java.jdbc/insert! db :records
nil
[(v 0) (v 1) (v 2) (str<->int (v 3))]))
(println "Records inserted successfully")
(Exception e
(println (.getNextException e) e))))
It's probably due to not using batching in your Clojure version. You insert rows one by one each triggering the commit.
If you want to do it in Clojure than you need to partition
rows from CSV files and insert!
each chunk as one batch commit. You need to use the last arity version accepting multiple col-val-vec
s. Sample code (not checked, just to show the idea):
(defn row->col-spec [row]
[(v 0) (v 1) (v 2) (str<->int (v 3))])
(with-open [csv-file (reader "/path/to/foo.txt")]
(try
(->> csv-file
(clojure-csv.core/parse-csv)
(map row->col-spec)
(partition 50)
(map (fn [batch] clojure.java.jdbc/insert! db :records ["col1" "col2" "col3" "col4"] batch))
(dorun))
(catch Exception e
(println e))))
If you don't have to do it in Clojure then using psql
's COPY
command seems to be the easiest and fastest option:
COPY records FROM '/path/to/foo.txt' WITH (FORMAT csv, DELIMITER ',', NULL 'NULL');
After 4 years, decided to come back to this problem and share a guide to the solution, I am sure this will help someone get started.
You can take a look at clojure.java.jdbc/insert-multi! and edit appropriately to suite the column types in your database
(let [from "/path/to/foo.txt"
to "/path/to/temp/foo.txt"]
(with-open [reader (io/reader from)
writer (io/writer to)]
(doall
(->> (csv/read-csv reader)
;(drop 1) ;if theres header
(map #(list (nth % 0 nil) (nth % 2 nil) (nth % 3 nil)))
(csv/write-csv writer))))
(let [fstream (slurp to)
streamarray (map #(str/split % #",")
(str/split-lines fstream))]
(clojure.java.jdbc/insert-multi! pg-db ;connection or {:datasource hk-cp}
:tbl_cdrs_da ;table name
[:origin_node_type :origin_transaction_id :da_ua_id] ;colums
streamarray))) ;array
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With