I have 900000 csv files which i want to combine into one big data.table
. For this case I created a for loop
which reads every file one by one and adds them to the data.table
. The problem is that it is performing to slow and the amount of time used is expanding exponentially. It would be great if someone could help me make the code run faster. Each one of the csv files has 300 rows and 15 columns.
The code I am using so far:
library(data.table)
setwd("~/My/Folder")
WD="~/My/Folder"
data<-data.table(read.csv(text="X,Field1,PostId,ThreadId,UserId,Timestamp,Upvotes,Downvotes,Flagged,Approved,Deleted,Replies,ReplyTo,Content,Sentiment"))
csv.list<- list.files(WD)
k=1
for (i in csv.list){
temp.data<-read.csv(i)
data<-data.table(rbind(data,temp.data))
if (k %% 100 == 0)
print(k/length(csv.list))
k<-k+1
}
Using readr Package You can consider this as a third option to load multiple CSV files into R DataFrame, This method uses the read_csv() function readr package. readr is a third-party library hence, in order to use readr library, you need to first install it by using install. packages('readr') .
csv() is actually faster than read_csv() while fread is much faster than both, although these savings are likely to be inconsequential for such small datasets.
The correct answer is RAM. RAM is the fastest to read from and write to than the other kinds of storage in a computer.
Presuming your files are conventional csv, I'd use data.table::fread
since it's faster. If you're on a Linux-like OS, I would use the fact it allows shell commands. Presuming your input files are the only csv files in the folder I'd do:
dt <- fread("tail -n-1 -q ~/My/Folder/*.csv")
You'll need to set the column names manually afterwards.
If you wanted to keep things in R, I'd use lapply
and rbindlist
:
lst <- lapply(csv.list, fread)
dt <- rbindlist(lst)
You could also use plyr::ldply
:
dt <- setDT(ldply(csv.list, fread))
This has the advantage that you can use .progress = "text"
to get a readout of progress in reading.
All of the above assume that the files all have the same format and have a header row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With