Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a list of files, apply function and rewrite with same name

Tags:

r

csv

I have a set of csv files with duplicate entries, which I needed to remove and rewrite the files with same name and format.

Here is what I have done so far,

filenames<-list.files(pattern =".csv") 
datalist <-lapply(filenames, function(x){read.csv(file=x,header=F)})
unique.list <- lapply(datalist,unique)

And I'm stuck with separating the data frames in the list and rewriting with same name. There is a more of a similar question, I tried hours but couldn't understand the proceedings.

like image 935
Freezon Avatar asked Jan 13 '23 04:01

Freezon


1 Answers

I'd definitely use a for loop. Shhhhhh, don't tell anyone I said that. Why? Three reasons...

  1. You want to call write.csv for it's side-effect, not it's return value, i.e. you want a file to be written to disk. Use *apply when you want a return value from your function.
  2. The main bottle neck will be disk I/O so I expect no performance overhead using a for loop compared to using an *apply loop.
  3. *apply functions will swallow memory on each iteration of the loop and are not guaranteed to free it up until all iterations have completed. In a for loop the memory is freed up at the start of the next iteration if you are overwriting objects inside the loop. If you are working with big csv files this could be an advantage. I will try and find a link to an answer where for solved a problem that lapply could not due to memory issues.

So all you need for my solution, given your de-duplicated data list is...

for( i in 1:length( filenames ) ){
  write.csv( unique.list[[i]] , filenames[[i]] )
}

Here is an answer where a for loop was required because the lapply equivalent ran into memory allocation errors.

like image 155
Simon O'Hanlon Avatar answered Jan 25 '23 23:01

Simon O'Hanlon