I am doing a relatively simple piece of analysis that I have put into a function on all the files in a particular folder. I was wondering whether anyone had any tips to help me automate the process on a number of different folders.
files <- (Sys.glob("*.csv"))
...which I found from Using R to list all files with a specified extension
And then the following code reads all those files into R.
listOfFiles <- lapply(files, function(x) read.table(x, header = FALSE))
…from Manipulating multiple files in R
But the files seem to be read in as one continuous list and not individual files… how can I change the script to open all the csv files in a particular folder as individual dataframes?
Secondly, assuming that I can read all the files in separately, how do I complete a function on all these dataframes in one go. For example, I have created four small dataframes so I can illustrate what I want:
Df.1 <- data.frame(A = c(5,4,7,6,8,4),B = (c(1,5,2,4,9,1))) Df.2 <- data.frame(A = c(1:6),B = (c(2,3,4,5,1,1))) Df.3 <- data.frame(A = c(4,6,8,0,1,11),B = (c(7,6,5,9,1,15))) Df.4 <- data.frame(A = c(4,2,6,8,1,0),B = (c(3,1,9,11,2,16)))
I have also made up an example function:
Summary<-function(dfile){ SumA<-sum(dfile$A) MinA<-min(dfile$A) MeanA<-mean(dfile$A) MedianA<-median(dfile$A) MaxA<-max(dfile$A) sumB<-sum(dfile$B) MinB<-min(dfile$B) MeanB<-mean(dfile$B) MedianB<-median(dfile$B) MaxB<-max(dfile$B) Sum<-c(sumA,sumB) Min<-c(MinA,MinB) Mean<-c(MeanA,MeanB) Median<-c(MedianA,MedianB) Max<-c(MaxA,MaxB) rm(sumA,sumB,MinA,MinB,MeanA,MeanB,MedianA,MedianB,MaxA,MaxB) Label<-c("A","B") dfile_summary<-data.frame(Label,Sum,Min,Mean,Median,Max) return(dfile_summary)}
I would ordinarily use the following command to apply the function to each individual dataframe.
Df1.summary<-Summary(dfile)
Is there a way instead of applying the function to all the dataframes, and use the titles of the dataframes in the summary tables (i.e. Df1.summary).
Many thanks,
Katie
To list all files in a directory in R programming language we use list. files(). This function produces a list containing the names of files in the named directory. It returns a character vector containing the names of the files in the specified directories.
os. listdir() method in python is used to get the list of all files and directories in the specified directory. If we don't specify any directory, then list of files and directories in the current working directory will be returned.
The list. dirs() method in R language is used to retrieve a list of directories present within the path specified. The output returned is in the form of a character vector containing the names of the files contained in the specified directory path, or returns null if no directories were returned.
On the contrary, I do think working with list
makes it easy to automate such things.
Here is one solution (I stored your four dataframes in folder temp/
).
filenames <- list.files("temp", pattern="*.csv", full.names=TRUE) ldf <- lapply(filenames, read.csv) res <- lapply(ldf, summary) names(res) <- substr(filenames, 6, 30)
It is important to store the full path for your files (as I did with full.names
), otherwise you have to paste the working directory, e.g.
filenames <- list.files("temp", pattern="*.csv") paste("temp", filenames, sep="/")
will work too. Note that I used substr
to extract file names while discarding full path.
You can access your summary tables as follows:
> res$`df4.csv` A B Min. :0.00 Min. : 1.00 1st Qu.:1.25 1st Qu.: 2.25 Median :3.00 Median : 6.00 Mean :3.50 Mean : 7.00 3rd Qu.:5.50 3rd Qu.:10.50 Max. :8.00 Max. :16.00
If you really want to get individual summary tables, you can extract them afterwards. E.g.,
for (i in 1:length(res)) assign(paste(paste("df", i, sep=""), "summary", sep="."), res[[i]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With