Read all files in a folder and apply a function to each data frame

Q: How do I read all files in a folder in R?

To list all files in a directory in R programming language we use list. files(). This function produces a list containing the names of files in the named directory. It returns a character vector containing the names of the files in the specified directories.

Q: How do I read all files in a directory in Python?

os. listdir() method in python is used to get the list of all files and directories in the specified directory. If we don't specify any directory, then list of files and directories in the current working directory will be returned.

Q: How do I get a list of files in a folder in R?

The list. dirs() method in R language is used to retrieve a list of directories present within the path specified. The output returned is in the form of a character vector containing the names of the files contained in the specified directory path, or returns null if no directories were returned.

Tags:

list

r

lapply

summary

I am doing a relatively simple piece of analysis that I have put into a function on all the files in a particular folder. I was wondering whether anyone had any tips to help me automate the process on a number of different folders.

Firstly, I was wondering whether there was a way of reading all the files in a particular folder straight into R. I believe the following command will list all the files:

files <- (Sys.glob("*.csv"))

...which I found from Using R to list all files with a specified extension

And then the following code reads all those files into R.

listOfFiles <- lapply(files, function(x) read.table(x, header = FALSE))

…from Manipulating multiple files in R

But the files seem to be read in as one continuous list and not individual files… how can I change the script to open all the csv files in a particular folder as individual dataframes?

Secondly, assuming that I can read all the files in separately, how do I complete a function on all these dataframes in one go. For example, I have created four small dataframes so I can illustrate what I want:

 Df.1 <- data.frame(A = c(5,4,7,6,8,4),B = (c(1,5,2,4,9,1)))  Df.2 <- data.frame(A = c(1:6),B = (c(2,3,4,5,1,1)))  Df.3 <- data.frame(A = c(4,6,8,0,1,11),B = (c(7,6,5,9,1,15)))  Df.4 <- data.frame(A = c(4,2,6,8,1,0),B = (c(3,1,9,11,2,16)))

I have also made up an example function:

Summary<-function(dfile){ SumA<-sum(dfile$A) MinA<-min(dfile$A) MeanA<-mean(dfile$A) MedianA<-median(dfile$A) MaxA<-max(dfile$A)  sumB<-sum(dfile$B) MinB<-min(dfile$B) MeanB<-mean(dfile$B) MedianB<-median(dfile$B) MaxB<-max(dfile$B)  Sum<-c(sumA,sumB) Min<-c(MinA,MinB) Mean<-c(MeanA,MeanB) Median<-c(MedianA,MedianB) Max<-c(MaxA,MaxB) rm(sumA,sumB,MinA,MinB,MeanA,MeanB,MedianA,MedianB,MaxA,MaxB)  Label<-c("A","B") dfile_summary<-data.frame(Label,Sum,Min,Mean,Median,Max) return(dfile_summary)}

I would ordinarily use the following command to apply the function to each individual dataframe.

Df1.summary<-Summary(dfile)

Is there a way instead of applying the function to all the dataframes, and use the titles of the dataframes in the summary tables (i.e. Df1.summary).

Many thanks,

Katie

581

asked Mar 05 '12 09:03

KT_1

1 Answers

On the contrary, I do think working with list makes it easy to automate such things.

Here is one solution (I stored your four dataframes in folder temp/).

filenames <- list.files("temp", pattern="*.csv", full.names=TRUE) ldf <- lapply(filenames, read.csv) res <- lapply(ldf, summary) names(res) <- substr(filenames, 6, 30)

It is important to store the full path for your files (as I did with full.names), otherwise you have to paste the working directory, e.g.

filenames <- list.files("temp", pattern="*.csv") paste("temp", filenames, sep="/")

will work too. Note that I used substr to extract file names while discarding full path.

You can access your summary tables as follows:

> res$`df4.csv`        A              B          Min.   :0.00   Min.   : 1.00    1st Qu.:1.25   1st Qu.: 2.25    Median :3.00   Median : 6.00    Mean   :3.50   Mean   : 7.00    3rd Qu.:5.50   3rd Qu.:10.50    Max.   :8.00   Max.   :16.00

If you really want to get individual summary tables, you can extract them afterwards. E.g.,

for (i in 1:length(res))   assign(paste(paste("df", i, sep=""), "summary", sep="."), res[[i]])

146

answered Oct 14 '22 17:10

chl

Related questions
                            
                                Shading a kernel density plot between two points.
                            
                                Case Statement Equivalent in R
                            
                                How to display only integer values on an axis using ggplot2
                            
                                What's the fastest way to merge/join data.frames in R?
                            
                                How to specify names of columns for x and y when joining in dplyr?
                            
                                How to create an empty R vector to add new items
                            
                                Apply several summary functions on several variables by group in one call
                            
                                Remove part of a string
                            
                                Select the first row by group
                            
                                Exception handling in R [closed]
                            
                                Convert a vector into a list, each element in the vector as an element in the list
                            
                                Remove facet_wrap labels completely
                            
                                R spreading multiple columns with tidyr [duplicate]
                            
                                How do you specifically order ggplot2 x axis instead of alphabetical order? [duplicate]
                            
                                Suppress output of a function
                            
                                Converting year and month ("yyyy-mm" format) to a date?
                            
                                Fitting a density curve to a histogram in R
                            
                                Working with dictionaries/lists in R
                            
                                In R, how to find the standard error of the mean?
                            
                                ggplot2 plot area margins?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With