Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert file records into postgres db using clojure jdbc is taking long time compared to python psycopg2

I am trying to insert records into postgres DB, and its taking about 3 hours while it takes 40seconds using python psycopg2 and cursor.copy_from method

What is wrong with my code, using clojure.java.jdbc/db-do-prepared also takes about 3 hours too. Please help!

File size is 175M and it has 409,854 records

(defn-
  str<->int [str]
  (let [n (read-string str)]
    (if (integer? n) n)))

(with-open [file (reader "/path/to/foo.txt")]
    (try
      (doseq [v (clojure-csv.core/parse-csv file)]

        (clojure.java.jdbc/insert! db  :records 
                      nil
                      [(v 0) (v 1) (v 2) (str<->int (v 3))]))
      (println "Records inserted successfully")
      (Exception e
        (println (.getNextException e) e))))
like image 660
stry-kai Avatar asked Mar 17 '16 11:03

stry-kai


2 Answers

It's probably due to not using batching in your Clojure version. You insert rows one by one each triggering the commit.

If you want to do it in Clojure than you need to partition rows from CSV files and insert! each chunk as one batch commit. You need to use the last arity version accepting multiple col-val-vecs. Sample code (not checked, just to show the idea):

(defn row->col-spec [row]
  [(v 0) (v 1) (v 2) (str<->int (v 3))])

(with-open [csv-file (reader "/path/to/foo.txt")]
  (try
    (->> csv-file
         (clojure-csv.core/parse-csv)
         (map row->col-spec)
         (partition 50)
         (map (fn [batch] clojure.java.jdbc/insert! db :records ["col1" "col2" "col3" "col4"] batch))
         (dorun))
    (catch Exception e
      (println e))))

If you don't have to do it in Clojure then using psql's COPY command seems to be the easiest and fastest option:

COPY records FROM '/path/to/foo.txt' WITH (FORMAT csv, DELIMITER ',',  NULL 'NULL');
like image 141
Piotrek Bzdyl Avatar answered Sep 29 '22 11:09

Piotrek Bzdyl


After 4 years, decided to come back to this problem and share a guide to the solution, I am sure this will help someone get started.

You can take a look at clojure.java.jdbc/insert-multi! and edit appropriately to suite the column types in your database

(let [from "/path/to/foo.txt"
      to "/path/to/temp/foo.txt"]
  (with-open [reader (io/reader from)
              writer (io/writer to)]
    (doall
      (->> (csv/read-csv reader)
           ;(drop 1)   ;if theres header
           (map #(list (nth % 0 nil) (nth % 2 nil)  (nth % 3 nil)))
           (csv/write-csv writer))))
  (let [fstream (slurp to)
        streamarray (map #(str/split % #",")
                         (str/split-lines fstream))]
    (clojure.java.jdbc/insert-multi! pg-db              ;connection or {:datasource hk-cp}
                                     :tbl_cdrs_da                 ;table name
                                     [:origin_node_type :origin_transaction_id :da_ua_id] ;colums
                                     streamarray)))               ;array
like image 21
stry-kai Avatar answered Sep 29 '22 11:09

stry-kai