consolidating data frames in R

Tags:

csv

Hi I have a lot of CSV files to process. Each file is generated by a run of an algorithm. My data always has one key and a value like this:

csv1:

        index value
  1     1     1
  2     2     1
  3     3     1
  4     4     1
  5     5     1

csv2:

      index value
1     4     3
2     5     3
3     6     3
4     7     3
5     8     3

Now I want to aggregate these CSV data, like this:

When both files contain an identical key e.g. 5, the resulting row should contain the key both files share (5) and the mean of both values ((1+3)/2 = 2). If only one file contains a key (e.g. 2), this row is just added to the result table (key = 2, value = 1).

Something like this:

      index value
1     1     1
2     2     1
3     3     1
4     4     2 (as (1+4)/2 = 2)
5     5     2 (as (1+4)/2 = 2)
6     6     3
7     7     3
8     8     3

At first I thought rbind() does the job, but it does not aggregate the values, only concatenates the data. How can I achieve that with R?

827

asked Mar 21 '12 15:03

Matthias B

1 Answers

Here is a solution. I am following all the excellent comments so far, and hopefully adding value by showing you how to handle any number of files. I am assuming you have all your csv files in the same directory (my.csv.dir below).

# locate the files
files <- list.files(my.csv.dir)

# read the files into a list of data.frames
data.list <- lapply(files, read.csv)

# concatenate into one big data.frame
data.cat <- do.call(rbind, data.list)

# aggregate
data.agg <- aggregate(value ~ index, data.cat, mean)

Edit: to handle your updated question in your comment below:

files     <- list.files(my.csv.dir)
algo.name <- sub("-.*", "", files)
data.list <- lapply(files, read.csv)
data.list <- Map(transform, data.list, algorithm = algo.name)
data.cat  <- do.call(rbind, data.list)
data.agg  <- aggregate(value ~ algorithm + index, data.cat, mean)

101

answered Oct 23 '22 08:10

flodel

Related questions
                            
                                Apply a function to all pairwise combinations of list elements in R
                            
                                Different results with randomForest() and caret's randomForest (method = "rf")
                            
                                How to add external data file into developing R package?
                            
                                Using more than one `scale_fill_` in `ggplot2`
                            
                                How do I get the R Shiny downloadHandler filename to work?
                            
                                How to perfectly align an unequal number of plots (ggplot2,gridExtra)
                            
                                Update join with multiple rows
                            
                                xaringan slide separator not separating slides
                            
                                Tidyverse: filtering n largest groups in grouped dataframe
                            
                                Define new scales axis tranform for ggplot
                            
                                Remove isolated elements of a vector
                            
                                Unload all loaded packages
                            
                                Rcpp and int64 NA value
                            
                                Maximum determinant of sub-matrix
                            
                                Change options on a per-facet basis
                            
                                Figures (R code execution results) in HTML help pages for a R package
                            
                                Apply a list of n functions to each row of a dataframe?
                            
                                Fastest way to reshape variable values as columns
                            
                                Print LaTeX Table Directly to an Image (PNG or other)
                            
                                R warning() wrapper - raise to parent function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With