Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast CSV reading in Common Lisp

What is the fastest way to read a csv file in CL in a way such that: 1) all fields in the first line go into one array called column-names 2) the first field of each of all following lines goes into another array called row-names 3) all other fields go into another array called values ?

My file has the following form, just with a lot more columns and rows:

"";"ES1 Index";"VG1 Index";"TY1 Comdty";"RX1 Comdty";"GC1 Comdty"
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441.7
"1999-01-07";1432.32;3106.08;66.25;86.22;447.67

And the result I would like is:

#("1999-01-04" "1999-01-05" "1999-01-06" "1999-01-07" )
#("" "ES1 Index" "VG1 Index" "TY1 Comdty" "RX1 Comdty" "GC1 Comdty")
#(1391.12 3034.53 66.515625 86.2 441.39 1404.86 3072.41 66.3125 86.17 440.63
  1435.12 3156.59 66.4375 86.32 441.7 1432.32 3106.08 66.25 86.22 447.67)

Are you aware of some CL library that does so already? Are there any general issues regarding I/O performance, maybe compiler-specific, that I should be aware of?

Here is the way I am doing it now:

(with-open-file (stream "my-file.csv" :direction :input)
   (let* ((header (read-line stream nil))
          (columns-list (mapcar #'read-from-string
                                (cl-ppcre:split ";" header)))
          (number-of-columns (length columns-list))
          (column-names (make-array number-of-columns
                                    :initial-contents columns-list))
          (rownames (make-array 1 :adjustable t :fill-pointer 0))
          (values (make-array 1 :adjustable t :fill-pointer 0)))
 (set-syntax-from-char #\; #\ )
 (loop
    :for reader = (read stream nil stream)
    :until (eq reader stream)
    :do (progn (vector-push-extend reader row-names)
           (loop
              :for count :from 2 :upto number-of-columns
              :do (vector-push-extend (read stream nil)
                                  values)))
    :finally (return (values row-names
                      column-names
                      values)))))

Note: I wouldn't use set-syntax-from-char in real code, I am using it just for the sake of this example.

like image 529
Danny Zuko Avatar asked Oct 19 '22 10:10

Danny Zuko


2 Answers

I suspect that the I/O is the slowest part here. You can probably get faster I/O if you use READ-SEQUENCE rather than calling READ-LINE repeatedly. So your code might look something like this:

(with-open-file (s "my-file.csv")
  (let* ((len (file-length s))
         (data (make-array len)))
    (read-sequence data s)
     data))

Then split data by newlines and add your logic.

Whether that helps or not, it'd helpful for you to profile your code, e.g. with :sb-sprof, to see where most of the time is being spent.

like image 112
blambert Avatar answered Oct 22 '22 23:10

blambert


To read csv files, I find very useful and fast the cl-csv package (https://github.com/AccelerationNet/cl-csv). For instance, to solve your problem, the following code could be used:

(let ((data (cl-csv:read-csv #P"my-file.csv" :separator #\;)))
  (values (apply #'vector (first data))
          (apply #'vector (rest (mapcar #'first data)))
          (apply #'vector 
             (mapcar #'read-from-string (loop :for row :in (rest data)
                                              :append (rest row))))))

cl-csv:read-csv returns a list contaning, for each row, a list of strings that are the contents of the cells.

like image 45
Renzo Avatar answered Oct 22 '22 23:10

Renzo