read.csv is extremely slow in reading csv files with large numbers of columns

Question

I have a .csv file: example.csv with 8000 columns x 40000 rows. The csv file have a string header for each column. All fields contains integer values between 0 and 10. When I try to load this file with read.csv it turns out to be extremely slow. It is also very slow when I add a parameter nrow=100. I wonder if there is a way to accelerate the read.csv, or use some other function instead of read.csv to load the file into memory as a matrix or data.frame?

Thanks in advance.

Joshua Ulrich · Accepted Answer

If your CSV only contains integers, you should use scan instead of read.csv, since ?read.csv says:

 ‘read.table’ is not the right tool for reading large matrices,  especially those with many columns: it is designed to read _data  frames_ which may have columns of very different classes.  Use  ‘scan’ instead for matrices.

Since your file has a header, you will need skip=1, and it will probably be faster if you set what=integer(). If you must use read.csv and speed / memory consumption are a concern, setting the colClasses argument is a huge help.

read.csv is extremely slow in reading csv files with large numbers of columns

Tags:

rninja

1 Answers

Joshua Ulrich

Recent Activity

Donate For Us

read.csv is extremely slow in reading csv files with large numbers of columns

Tags:

rninja

1 Answers

Joshua Ulrich

Related questions

Recent Activity

Donate For Us