Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I import a large (6 Gb) .csv file into R efficiently and quickly, without the R REPL crashing?

Tags:

r

csv

I have a large .csv file which I need to import into R in order to do some data manipulation on it. I'm using the read.csv(file.csv) method, where I assign the result of the method to some variable MyData. However, when I attempt to run this in the R REPL, the program crashes. Is there a way to efficiently and quickly process/read a .csv file in R that won't crash the terminal? If there isn't, shouldn't I just be using Python?

like image 654
asdf asdf Avatar asked Oct 24 '25 02:10

asdf asdf


1 Answers

R will crash if you try to load a file that is larger than your available memory, so you should see that you have at least 6gb ram free (a 6gb .csv is roughly 6gb in memory also). Python will have the same problem (apparently someone asked the exact same question for python a few years ago)

For reading large csv files, you should either use readr::read_csv() or data.table::fread(), as both are much faster than base::read.table().

readr::read_csv_chunked supports reading csv files in chunks, so if you don't need your whole data at once, that might help. You could also try just reading the columns of interest, to keep the memory size smaller.

like image 142
Stefan F Avatar answered Oct 26 '25 18:10

Stefan F