Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I read 1 big CSV file in parallel in R? [duplicate]

I have a big csv file and it takes ages to read. Can I read this in parallel in R using a package like "parallel" or related? I've tried using mclapply, but it is not working.

like image 627
Ansjovis86 Avatar asked Apr 29 '15 15:04

Ansjovis86


People also ask

What is the better way to read the large CSV file?

So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.

How do I read multiple CSV files in R?

In order to read multiple CSV files or all files from a folder in R, use data. table package. data. table is a third-party library hence, in order to use data.

Can you load a CSV file in R True or false?

The CSV files can be loaded into the working space and worked using both in-built methods and external package imports. The read. csv() method in base R is used to load a .


1 Answers

Based upon the comment by the OP, fread from the data.table package worked. Here's the code:

library(data.table)
dt <- fread("myFile.csv")

In the OP's case, read in time for a 1.2GB file with read.csv it took about 4-5 minutes and just 14 seconds with fread.

Update 29 January 2021: It appears that fread() now works in parallel per a Tweet from the package's creator.

like image 95
Richard Erickson Avatar answered Oct 09 '22 07:10

Richard Erickson