Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

consolidating data frames in R

Tags:

r

csv

Hi I have a lot of CSV files to process. Each file is generated by a run of an algorithm. My data always has one key and a value like this:

csv1:

        index value
  1     1     1
  2     2     1
  3     3     1
  4     4     1
  5     5     1

csv2:

      index value
1     4     3
2     5     3
3     6     3
4     7     3
5     8     3

Now I want to aggregate these CSV data, like this:

When both files contain an identical key e.g. 5, the resulting row should contain the key both files share (5) and the mean of both values ((1+3)/2 = 2). If only one file contains a key (e.g. 2), this row is just added to the result table (key = 2, value = 1).

Something like this:

      index value
1     1     1
2     2     1
3     3     1
4     4     2 (as (1+4)/2 = 2)
5     5     2 (as (1+4)/2 = 2)
6     6     3
7     7     3
8     8     3

At first I thought rbind() does the job, but it does not aggregate the values, only concatenates the data. How can I achieve that with R?

like image 827
Matthias B Avatar asked Mar 21 '12 15:03

Matthias B


People also ask

How do I combine data frames in R?

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS.

Can you merge more than 2 Dataframes in R?

Join Multiple R DataFrames To join more than two (multiple) R dataframes, then reduce() is used. It is available in the tidyverse package which will convert all the dataframes to a list and join the dataframes based on the column. It is similar to the above joins.

How do I merge two data frames in the same column in R?

To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.

How do I merge two data frames?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.


1 Answers

Here is a solution. I am following all the excellent comments so far, and hopefully adding value by showing you how to handle any number of files. I am assuming you have all your csv files in the same directory (my.csv.dir below).

# locate the files
files <- list.files(my.csv.dir)

# read the files into a list of data.frames
data.list <- lapply(files, read.csv)

# concatenate into one big data.frame
data.cat <- do.call(rbind, data.list)

# aggregate
data.agg <- aggregate(value ~ index, data.cat, mean)

Edit: to handle your updated question in your comment below:

files     <- list.files(my.csv.dir)
algo.name <- sub("-.*", "", files)
data.list <- lapply(files, read.csv)
data.list <- Map(transform, data.list, algorithm = algo.name)
data.cat  <- do.call(rbind, data.list)
data.agg  <- aggregate(value ~ algorithm + index, data.cat, mean)
like image 101
flodel Avatar answered Oct 23 '22 08:10

flodel