I have a big csv file and it takes ages to read. Can I read this in parallel in R using a package like "parallel" or related? I've tried using mclapply, but it is not working.
So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.
In order to read multiple CSV files or all files from a folder in R, use data. table package. data. table is a third-party library hence, in order to use data.
The CSV files can be loaded into the working space and worked using both in-built methods and external package imports. The read. csv() method in base R is used to load a .
Based upon the comment by the OP, fread
from the data.table
package worked. Here's the code:
library(data.table)
dt <- fread("myFile.csv")
In the OP's case, read in time for a 1.2GB file with read.csv
it took about 4-5 minutes and just 14 seconds with fread
.
Update 29 January 2021: It appears that fread()
now works in parallel per a Tweet from the package's creator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With