I have ~ 100 files in the following format, each file has its own file name, but all these files are save in the same directory, let's said, filecd is follows:
A B C D
ab 0.3 0.0 0.2 0.20
cd 0.7 0.0 0.3 0.77
ef 0.8 0.1 0.5 0.91
gh 0.3 0.5 0.6 0.78
fileabb is as follows:
A B C D
ab 0.3 0.9 1.0 0.20
gh 0.3 0.5 0.6 0.9
All these files have same number of columns but different number of rows.
For each file I want to summarize them as one row (0 for all cells in the same column are < 0.8; 1 for ANY of the cells in the same column is larger than or equal to 0.8), and the summerized results will be saved in a separate csv file as follows:
A B C D
filecd 1 0 0 1
fileabb 0 1 1 1
..... till 100
Instead of reading files and processing each files separately, could it be done by R efficiently? Could you give me help on how to do so? Thanks.
For the ease of discussion. I have add following lines for sample input files:
file1 <- data.frame(A=c(0.3, 0.7, 0.8, 0.3), B=c(0,0,0.1,0.5), C=c(0.2,0.3,0.5,0.6), D=c(0.2,0.77,0.91, 0.78))
file2 <- data.frame(A=c(0.3, 0.3), B=c(0.9,0.5), C=c(1,0.6), D=c(0.2,0.9))
Please kindly give me some more advice. Many thanks.
First make a vector of all the filenames.
filenames <- dir(your_data_dir) #you may also need the pattern argument
Then read the data into a list of data frames.
data_list <- lapply(filenames, function(fn) as.matrix(read.delim(fn)))
#maybe with other arguments passed to read.delim
Now calculate the summary.
summarised <- lapply(data_list, function(dfr)
{
apply(x, 2, function(row) any(row >= 0.8))
})
Convert this list into a matrix.
summary_matrix <- do.call(rbind, summarised)
Make the rownames match the file.
rownames(summary_matrix) <- filenames
Now write out to CSV.
write.csv(summary_matrix, "my_summary_matrix.csv")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With