I have a big csv file and it takes ages to read. Can I read this in parallel in R using a package like "parallel" or related? I've tried using mclapply, but it is not working.

Based upon the comment by the OP, <code>fread</code> from the <code>data.table</code> package worked. Here's the code: <pre class="prettyprint"><code>library(data.table) dt <- fread("myFile.csv") </code></pre> In the OP's case, read in time for a 1.2GB file with <code>read.csv</code> it took about 4-5 minutes and just 14 seconds with <code>fread</code>. Update 29 January 2021: It appears that <code>fread()</code> now works in parallel per a Tweet from the package's creator.

Can I read 1 big CSV file in parallel in R? [duplicate]

1 Answers

Based upon the comment by the OP, fread from the data.table package worked. Here's the code:

library(data.table)
dt <- fread("myFile.csv")

In the OP's case, read in time for a 1.2GB file with read.csv it took about 4-5 minutes and just 14 seconds with fread.

Update 29 January 2021: It appears that fread() now works in parallel per a Tweet from the package's creator.

answered Oct 09 '22 07:10

Richard Erickson

Related questions
                            
                                Make table show percentages instead of frequencies in R
                            
                                Extracting nth element from a nested list following strsplit - R
                            
                                Code box size and font size in RPres
                            
                                Calculation of mutual information in R
                            
                                Building R package from github: how to disable building vignettes?
                            
                                Compute matrix of sums
                            
                                Inserting file names as column values in a data frame
                            
                                R-Shiny using Reactive renderUI value
                            
                                Can I adjust the lower limit of scale_color_brewer?
                            
                                convert jpg to greyscale csv using R
                            
                                Improve speed/use of gDistance function by using parallel processing and/or plyr/dplyr?
                            
                                data.table paste selected columns by index
                            
                                Error with curve( ): 'expr' did not evaluate to an object of length 'n'
                            
                                Count number of unique values per row [duplicate]
                            
                                Downloading and extracting .gz data file using R
                            
                                dplyr cross tab with missing values
                            
                                Include indication of extreme outliers in ggplot
                            
                                What is the difference between as.character() and as( ,"character") in R
                            
                                Reading OneDrive files to R
                            
                                matrix multiplication with Rcpp - different values when the output is assigned

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I read 1 big CSV file in parallel in R? [duplicate]

Tags:

r

multithreading

csv

parallel-processing

Ansjovis86

People also ask

1 Answers

Richard Erickson

Recent Activity

Donate For Us