I have a set of csv files with duplicate entries, which I needed to remove and rewrite the files with same name and format.
Here is what I have done so far,
filenames<-list.files(pattern =".csv")
datalist <-lapply(filenames, function(x){read.csv(file=x,header=F)})
unique.list <- lapply(datalist,unique)
And I'm stuck with separating the data frames in the list and rewriting with same name. There is a more of a similar question, I tried hours but couldn't understand the proceedings.
I'd definitely use a for loop. Shhhhhh, don't tell anyone I said that. Why? Three reasons...
write.csv for it's side-effect, not it's return value, i.e. you want a file to be written to disk. Use *apply when you want a return value from your function.for loop compared to using an *apply loop.*apply functions will swallow memory on each iteration of the loop and are not guaranteed to free it up until all iterations have completed. In a for loop the memory is freed up at the start of the next iteration if you are overwriting objects inside the loop. If you are working with big csv files this could be an advantage. I will try and find a link to an answer where for solved a problem that lapply could not due to memory issues.So all you need for my solution, given your de-duplicated data list is...
for( i in 1:length( filenames ) ){
write.csv( unique.list[[i]] , filenames[[i]] )
}
Here is an answer where a for loop was required because the lapply equivalent ran into memory allocation errors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With