Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group and summarise each data frame in a list of data frames

Tags:

r

dplyr

I have a list of data frames:

df1 <- data.frame(one = c('red','blue','green','red','red','blue','green','green'),
                  one.1 = as.numeric(c('1','1','0','1','1','0','0','0')))

df2 <- data.frame(two = c('red','yellow','green','yellow','green','blue','blue','red'),
                  two.2 = as.numeric(c('0','1','1','0','0','0','1','1')))

df3 <- data.frame(three = c('yellow','yellow','green','green','green','white','blue','white'),
                  three.3 = as.numeric(c('1','0','0','1','1','0','0','1')))

all <- list(df1,df2,df3)

I need to group each data frame by the first column and summarise the second column. Individually I would do something like this:

library(dplyr)

df1 <- df1 %>%
  group_by(one) %>%
  summarise(sum = sum(one.1))

However I'm having trouble figuring out how to iterate over each item in the list.

I've thought of using a loop:

for(i in 1:3){
      all[i] <- all[i] %>%
      group_by_at(1) %>%
      summarise()
}

But I can't figure out how to specify a column to sum in the summarise() function (this loop is likely wrong in other ways than that anyway).

Ideally I need the output to be another list with each item being the summarised data, like so:

[[1]]
# A tibble: 3 x 2
  one     sum
  <fct> <dbl>
1 blue      1
2 green     0
3 red       3

[[2]]
# A tibble: 4 x 2
  two      sum
  <fct>  <dbl>
1 blue       1
2 green      1
3 red        1
4 yellow     1

[[3]]
# A tibble: 4 x 2
  three    sum
  <fct>  <dbl>
1 blue       0
2 green      2
3 white      1
4 yellow     1

Would really appreciate any help!

like image 816
nogbad Avatar asked Jul 31 '19 10:07

nogbad


People also ask

How do you summarize a data frame?

We can summarize the data present in the data frame using describe() method. This method is used to get min, max, sum, count values from the data frame along with data types of that particular column. describe(): This method elaborates the type of data and its attributes.

How do you group by and summarize in R?

Group By Summarise R ExampleTo get the dropped dataframe use group_by() function. To use group_by() and summarize() functions, you have to install dplyr first using install. packages('dplyr') and load it using library(dplyr) . All functions in dplyr package take data.

How do I combine a list of data frames in R?

To combine data frames stored in a list in R, we can use full_join function of dplyr package inside Reduce function.

Can you store data frames in a list?

Creating a list of Dataframes. To create a list of Dataframes we use the list() function in R and then pass each of the data frame you have created as arguments to the function.


2 Answers

Using purrr::map and summarise at columns contain a letteral dot \\. using matches helper.

library(dplyr)
library(purrr)
map(all, ~.x %>%
    #group_by_at(vars(matches('one$|two$|three$'))) %>% #column ends with one, two, or three
    group_by_at(1) %>%
    summarise_at(vars(matches('\\.')),sum))
    #summarise_at(vars(matches('\\.')),list(sum=~sum))) #2nd option

[[1]]
# A tibble: 3 x 2
one   one.1
<fct> <dbl>
1 blue      1
2 green     0
3 red       3

[[2]]
# A tibble: 4 x 2
two    two.2
<fct>  <dbl>
1 blue       1
2 green      1
3 red        1
4 yellow     1

[[3]]
# A tibble: 4 x 2
three  three.3
<fct>    <dbl>
1 blue         0
2 green        2
3 white        1
4 yellow       1
like image 193
A. Suliman Avatar answered Oct 22 '22 10:10

A. Suliman


Here's a base R solution:

lapply(all, function(DF) aggregate(list(added = DF[, 2]), by = DF[, 1, drop = F], FUN = sum))

[[1]]
    one added
1  blue     1
2 green     0
3   red     3

[[2]]
     two added
1   blue     1
2  green     1
3    red     1
4 yellow     1

[[3]]
   three added
1   blue     0
2  green     2
3  white     1
4 yellow     1

Another approach would be to bind the lists into one. Here I use data.table and avoid using the names. The only problem is that this may mess up factors but I'm not sure that's an issue in your case.

library(data.table)
rbindlist(all, use.names = F, idcol = 'id'
          )[, .(added = sum(one.1)), by = .(id, color = one)]

    id  color added
 1:  1    red     3
 2:  1   blue     1
 3:  1  green     0
 4:  2    red     1
 5:  2 yellow     1
 6:  2  green     1
 7:  2   blue     1
 8:  3 yellow     1
 9:  3  green     2
10:  3  white     1
11:  3   blue     0
like image 2
Cole Avatar answered Oct 22 '22 08:10

Cole