What is the fastest way to read a csv file in CL in a way such that: 1) all fields in the first line go into one array called column-names 2) the first field of each of all following lines goes into another array called row-names 3) all other fields go into another array called values ?
My file has the following form, just with a lot more columns and rows:
"";"ES1 Index";"VG1 Index";"TY1 Comdty";"RX1 Comdty";"GC1 Comdty"
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441.7
"1999-01-07";1432.32;3106.08;66.25;86.22;447.67
And the result I would like is:
#("1999-01-04" "1999-01-05" "1999-01-06" "1999-01-07" )
#("" "ES1 Index" "VG1 Index" "TY1 Comdty" "RX1 Comdty" "GC1 Comdty")
#(1391.12 3034.53 66.515625 86.2 441.39 1404.86 3072.41 66.3125 86.17 440.63
1435.12 3156.59 66.4375 86.32 441.7 1432.32 3106.08 66.25 86.22 447.67)
Are you aware of some CL library that does so already? Are there any general issues regarding I/O performance, maybe compiler-specific, that I should be aware of?
Here is the way I am doing it now:
(with-open-file (stream "my-file.csv" :direction :input)
(let* ((header (read-line stream nil))
(columns-list (mapcar #'read-from-string
(cl-ppcre:split ";" header)))
(number-of-columns (length columns-list))
(column-names (make-array number-of-columns
:initial-contents columns-list))
(rownames (make-array 1 :adjustable t :fill-pointer 0))
(values (make-array 1 :adjustable t :fill-pointer 0)))
(set-syntax-from-char #\; #\ )
(loop
:for reader = (read stream nil stream)
:until (eq reader stream)
:do (progn (vector-push-extend reader row-names)
(loop
:for count :from 2 :upto number-of-columns
:do (vector-push-extend (read stream nil)
values)))
:finally (return (values row-names
column-names
values)))))
Note: I wouldn't use set-syntax-from-char in real code, I am using it just for the sake of this example.
I suspect that the I/O is the slowest part here. You can probably get faster I/O if you use READ-SEQUENCE rather than calling READ-LINE repeatedly. So your code might look something like this:
(with-open-file (s "my-file.csv")
(let* ((len (file-length s))
(data (make-array len)))
(read-sequence data s)
data))
Then split data
by newlines and add your logic.
Whether that helps or not, it'd helpful for you to profile your code, e.g. with :sb-sprof
, to see where most of the time is being spent.
To read csv files, I find very useful and fast the cl-csv package (https://github.com/AccelerationNet/cl-csv). For instance, to solve your problem, the following code could be used:
(let ((data (cl-csv:read-csv #P"my-file.csv" :separator #\;)))
(values (apply #'vector (first data))
(apply #'vector (rest (mapcar #'first data)))
(apply #'vector
(mapcar #'read-from-string (loop :for row :in (rest data)
:append (rest row))))))
cl-csv:read-csv
returns a list contaning, for each row, a list of strings that are the contents of the cells.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With