I have a set of csv files with duplicate entries, which I needed to remove and rewrite the files with same name and format.
Here is what I have done so far,
filenames<-list.files(pattern =".csv")
datalist <-lapply(filenames, function(x){read.csv(file=x,header=F)})
unique.list <- lapply(datalist,unique)
And I'm stuck with separating the data frames in the list and rewriting with same name. There is a more of a similar question, I tried hours but couldn't understand the proceedings.
I'd definitely use a for
loop. Shhhhhh, don't tell anyone I said that. Why? Three reasons...
write.csv
for it's side-effect, not it's return value, i.e. you want a file to be written to disk. Use *apply
when you want a return value from your function.for
loop compared to using an *apply
loop.*apply
functions will swallow memory on each iteration of the loop and are not guaranteed to free it up until all iterations have completed. In a for
loop the memory is freed up at the start of the next iteration if you are overwriting objects inside the loop. If you are working with big csv
files this could be an advantage. I will try and find a link to an answer where for
solved a problem that lapply
could not due to memory issues.So all you need for my solution, given your de-duplicated data list is...
for( i in 1:length( filenames ) ){
write.csv( unique.list[[i]] , filenames[[i]] )
}
Here is an answer where a for
loop was required because the lapply
equivalent ran into memory allocation errors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With