Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to summarize multiple files into one file based on an assigned rule?

Tags:

r

csv

I have ~ 100 files in the following format, each file has its own file name, but all these files are save in the same directory, let's said, filecd is follows:

   A    B    C    D
ab 0.3  0.0  0.2  0.20
cd 0.7  0.0  0.3  0.77
ef 0.8  0.1  0.5  0.91
gh 0.3  0.5  0.6  0.78

fileabb is as follows:

   A    B    C    D
ab 0.3  0.9  1.0  0.20
gh 0.3  0.5  0.6  0.9

All these files have same number of columns but different number of rows.

For each file I want to summarize them as one row (0 for all cells in the same column are < 0.8; 1 for ANY of the cells in the same column is larger than or equal to 0.8), and the summerized results will be saved in a separate csv file as follows:

        A B C D    
filecd  1 0 0 1
fileabb 0 1 1 1
..... till 100

Instead of reading files and processing each files separately, could it be done by R efficiently? Could you give me help on how to do so? Thanks.

For the ease of discussion. I have add following lines for sample input files:

file1 <- data.frame(A=c(0.3, 0.7, 0.8, 0.3), B=c(0,0,0.1,0.5), C=c(0.2,0.3,0.5,0.6), D=c(0.2,0.77,0.91, 0.78))

file2 <- data.frame(A=c(0.3, 0.3), B=c(0.9,0.5), C=c(1,0.6), D=c(0.2,0.9))

Please kindly give me some more advice. Many thanks.

like image 613
psiu Avatar asked Jan 22 '26 18:01

psiu


1 Answers

First make a vector of all the filenames.

filenames <- dir(your_data_dir)  #you may also need the pattern argument

Then read the data into a list of data frames.

data_list <- lapply(filenames, function(fn) as.matrix(read.delim(fn))) 
#maybe with other arguments passed to read.delim

Now calculate the summary.

summarised <- lapply(data_list, function(dfr)
{
  apply(x, 2, function(row) any(row >= 0.8))
})

Convert this list into a matrix.

summary_matrix <- do.call(rbind, summarised)

Make the rownames match the file.

rownames(summary_matrix) <- filenames

Now write out to CSV.

write.csv(summary_matrix, "my_summary_matrix.csv")
like image 184
Richie Cotton Avatar answered Jan 24 '26 07:01

Richie Cotton