I have a .csv file: example.csv with 8000 columns x 40000 rows. The csv file have a string header for each column. All fields contains integer values between 0 and 10. When I try to load this file with read.csv it turns out to be extremely slow. It is also very slow when I add a parameter nrow=100. I wonder if there is a way to accelerate the read.csv, or use some other function instead of read.csv to load the file into memory as a matrix or data.frame?
Thanks in advance.
If your CSV only contains integers, you should use scan
instead of read.csv
, since ?read.csv
says:
‘read.table’ is not the right tool for reading large matrices, especially those with many columns: it is designed to read _data frames_ which may have columns of very different classes. Use ‘scan’ instead for matrices.
Since your file has a header, you will need skip=1
, and it will probably be faster if you set what=integer()
. If you must use read.csv
and speed / memory consumption are a concern, setting the colClasses
argument is a huge help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With