I have a list of .stat files in tmp directory.
sample:
a.stat=>
abc,10
abc,20
abc,30
b.stat=>
xyz,10
xyz,30
xyz,70
and so on
I need to find summary of all .stat files.
Currently I am using
filelist<-list.files(path="/tmp/",pattern=".stat")
data<-sapply(paste("/tmp/",filelist,sep=''), read.csv, header=FALSE)
However I need to apply summary to all files being read. Or simply in n number of .stat files I need summary from 2nd column column
using
data<-sapply(paste("/tmp/",filelist,sep=''), summary, read.csv, header=FALSE)
does not work and gives me summary with class character, which is no what I intend.
sapply(filelist, function(filename){df <- read.csv(filename, header=F);print(summary(df[,2]))})
works fine. However my overall objective is to find values that are more than 2 standard deviations away on either side (outliers). So I use sd, but at the same time need to check if all values in the file currently read come under 2SD range.
Difference between lapply() and sapply() functions:lapply() function displays the output as a list whereas sapply() function displays the output as a vector. lapply() and sapply() functions are used to perform some operations in a list of objects.
The apply functions (apply, sapply, lapply etc.) are marginally faster than a regular for loop, but still do their looping in R, rather than dropping down to the lower level of C code. For a beginner, it can also be difficult to understand why you would want to use one of these functions with their arcane syntax.
sapply() function in R Language takes list, vector or data frame as input and gives output in vector or matrix. It is useful for operations on list objects and returns a list object of same length of original set.
Vector output: sapply and vapplysapply() and vapply() are very similar to lapply() except they simplify their output to produce an atomic vector.
To apply multiple functions at once:
f <- function(x){
list(sum(x),mean(x))
}
sapply(x, f)
In your case you want to apply them sequentially, so first read csv data then do summary:
sapply(lapply(paste("/tmp/",filelist,sep=''), read.csv), summary)
To subset your datasets to run summary on particular column you can use change outer sapply function from summary
to function(x) summary(x[[2]])
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With