I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe <code>class_df</code> would look like: <pre class="prettyprint"><code>id A B C D E F 1 0.20 0.30 0.10 0.15 0.25 0.00 2 0.05 0.10 0.05 0.30 0.10 0.40 3 0.10 0.10 0.10 0.20 0.20 0.30 </code></pre> Each of these classes belongs to a functional group and I want to create new columns where the proportions of each functional group are calculated from the classes. An example mapping <code>class_fg</code> <pre class="prettyprint"><code>class fg A Z B Z C Z D Y E Y F X </code></pre> and the desired result would be (line added to show the desired new columns): <pre class="prettyprint"><code>id A B C D E F | X Y Z 1 0.20 0.30 0.10 0.15 0.25 0.00 | 0.00 0.40 0.60 2 0.05 0.10 0.05 0.30 0.10 0.40 | 0.40 0.40 0.20 3 0.10 0.10 0.10 0.20 0.20 0.30 | 0.30 0.40 0.30 </code></pre> And I can do it one functional group at a time using <pre class="prettyprint"><code>first_fg <- class_fg %>% filter(fg == "Z") %>% select(class) %>% unlist() class_df <- class_df %>% mutate(Z = rowSums(select(., one_of(first_fg)))) </code></pre> Surely there is a better way to do this where I can calculate the row sum for each functional group without having to just repeat this code for each group? Maybe using purrr?

Yet another option, along with the already contributed working solutions, would be to use quasiquotation with the <code>rlang</code> package to build expressions to calculate the sums in each group. <pre class="prettyprint"><code>library(tidyverse) </code></pre> First, define a helper function for doing an elementwise sum of vectors: <pre class="prettyprint"><code>psum <- function(...) reduce(list(...), `+`) </code></pre> Extracting the groupings into a list from <code>class_fg</code> we can then construct a list of expressions to calculate the sum in each group: <pre class="prettyprint"><code>sum_exprs <- with(class_fg, split(class, fg)) %>% map(~ rlang::expr(psum(!!!rlang::syms(.x)))) sum_exprs #> $X #> psum(F) #> #> $Y #> psum(D, E) #> #> $Z #> psum(A, B, C) </code></pre> With the list of expressions ready, we can just "bang-bang-bang" (<code>!!!</code>) them into the data with <code>mutate</code>: <pre class="prettyprint"><code>class_df %>% mutate(!!!sum_exprs) #> id A B C D E F X Y Z #> 1 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6 #> 2 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2 #> 3 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3 </code></pre> (I used the code provided by @MKR in his answer to create the data). Created on 2018-05-22 by the reprex package (v0.2.0).

Always it is easier to work on data in long format. Hence, change <code>class_df</code> to long format using <code>tidyr:gather</code> and join with <code>class_fg</code>. Perform analysis in long format on your data. Finally, spread in wide-format to match expected result. <pre class="prettyprint"><code>library(tidyverse) class_df %>% gather(key, value, -id) %>% inner_join(class_fg, by=c("key" = "class")) %>% group_by(id, fg) %>% summarise(value = sum(value)) %>% spread(fg, value) %>% inner_join(class_df, by="id") %>% as.data.frame() # id X Y Z A B C D E F # 1 1 0.0 0.4 0.6 0.20 0.3 0.10 0.15 0.25 0.0 # 2 2 0.4 0.4 0.2 0.05 0.1 0.05 0.30 0.10 0.4 # 3 3 0.3 0.4 0.3 0.10 0.1 0.10 0.20 0.20 0.3 </code></pre> Data: <pre class="prettyprint"><code>class_fg <- read.table(text = "class fg A Z B Z C Z D Y E Y F X", header = TRUE, stringsAsFactors = FALSE) class_df <- read.table(text = "id A B C D E F 1 0.20 0.30 0.10 0.15 0.25 0.00 2 0.05 0.10 0.05 0.30 0.10 0.40 3 0.10 0.10 0.10 0.20 0.20 0.30", header = TRUE, stringsAsFactors = FALSE) </code></pre>

My usual approach is to stick to <code>base</code> R as long as the data sets don't get too large. In your case, a <code>base</code> R solution would be: <pre class="prettyprint"><code>class_df=as.data.frame( c(class_df, lapply(split(class_fg,class_fg$fg), function(x) rowSums(class_df[,x$class,drop=FALSE])))) class_df # id A B C D E F X Y Z #1 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6 #2 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2 #3 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3 </code></pre> If the data sets get too large, I use <code>data.table</code>. A <code>data.table</code> solution for your problem: <pre class="prettyprint"><code>library(data.table) class_dt=data.table(class_df) grps=split(class_fg,class_fg$fg) for (g in grps) class_dt[,c(g$fg[1]):=rowSums(.SD),.SDcols=g$class,] class_dt # id A B C D E F X Y Z #1: 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6 #2: 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2 #3: 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3 </code></pre>

Summing Multiple Groups of Columns

Tags:

r

group-by

dplyr

purrr

summarization

I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df would look like:

id    A    B    C    D    E    F
 1 0.20 0.30 0.10 0.15 0.25 0.00 
 2 0.05 0.10 0.05 0.30 0.10 0.40
 3 0.10 0.10 0.10 0.20 0.20 0.30

Each of these classes belongs to a functional group and I want to create new columns where the proportions of each functional group are calculated from the classes. An example mapping class_fg

class         fg
    A          Z
    B          Z
    C          Z
    D          Y
    E          Y
    F          X

and the desired result would be (line added to show the desired new columns):

id    A    B    C    D    E    F |    X    Y    Z
 1 0.20 0.30 0.10 0.15 0.25 0.00 | 0.00 0.40 0.60
 2 0.05 0.10 0.05 0.30 0.10 0.40 | 0.40 0.40 0.20
 3 0.10 0.10 0.10 0.20 0.20 0.30 | 0.30 0.40 0.30

And I can do it one functional group at a time using

first_fg <- class_fg %>%
  filter(fg == "Z") %>%
  select(class) %>%
  unlist()

class_df <- class_df %>%
  mutate(Z = rowSums(select(., one_of(first_fg))))

Surely there is a better way to do this where I can calculate the row sum for each functional group without having to just repeat this code for each group? Maybe using purrr?

605

asked May 22 '18 05:05

Syzorr

4 Answers

Yet another option, along with the already contributed working solutions, would be to use quasiquotation with the rlang package to build expressions to calculate the sums in each group.

library(tidyverse)

First, define a helper function for doing an elementwise sum of vectors:

psum <- function(...) reduce(list(...), `+`)

Extracting the groupings into a list from class_fg we can then construct a list of expressions to calculate the sum in each group:

sum_exprs <- with(class_fg, split(class, fg)) %>% 
  map(~ rlang::expr(psum(!!!rlang::syms(.x))))

sum_exprs
#> $X
#> psum(F)
#> 
#> $Y
#> psum(D, E)
#> 
#> $Z
#> psum(A, B, C)

With the list of expressions ready, we can just "bang-bang-bang" (!!!) them into the data with mutate:

class_df %>%
  mutate(!!!sum_exprs)
#>   id    A   B    C    D    E   F   X   Y   Z
#> 1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#> 2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#> 3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

(I used the code provided by @MKR in his answer to create the data).

Created on 2018-05-22 by the reprex package (v0.2.0).

152

answered Oct 16 '22 20:10

Mikko Marttila

We could split the 'class_df' by 'class', loop through the list elements with map, select the columns of 'class_df' and get the rowSums

library(tidyverse)
class_fg %>%
    split(.$fg) %>% 
    map_df(~ class_df %>%
                select(one_of(.x$class)) %>% 
                rowSums) %>%
    bind_cols(class_df, .)
#  id    A   B    C    D    E   F   X   Y   Z
#1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

Or do a group by nesting, and then do the rowSums by mapping over the list elements

class_fg %>% 
   group_by(fg) %>%
   nest %>%
   mutate(out = map(data, ~  class_df %>%
                               select(one_of(.x$class)) %>% 
                               rowSums)) %>% 
   select(-data)  %>%
   unnest %>% 
   unstack(., out ~ fg) %>% 
   bind_cols(class_df, .)

answered Oct 16 '22 21:10

akrun

Always it is easier to work on data in long format. Hence, change class_df to long format using tidyr:gather and join with class_fg. Perform analysis in long format on your data. Finally, spread in wide-format to match expected result.

library(tidyverse)

class_df %>% gather(key, value, -id) %>% 
  inner_join(class_fg, by=c("key" = "class")) %>%
  group_by(id, fg) %>%
  summarise(value = sum(value)) %>%
  spread(fg, value) %>%
  inner_join(class_df, by="id") %>% as.data.frame()

#   id   X   Y   Z    A   B    C    D    E   F
# 1  1 0.0 0.4 0.6 0.20 0.3 0.10 0.15 0.25 0.0
# 2  2 0.4 0.4 0.2 0.05 0.1 0.05 0.30 0.10 0.4
# 3  3 0.3 0.4 0.3 0.10 0.1 0.10 0.20 0.20 0.3

Data:

class_fg <- read.table(text = 
"class         fg
                 A          Z
                 B          Z
                 C          Z
                 D          Y
                 E          Y
                 F          X",
header = TRUE, stringsAsFactors = FALSE)

class_df  <- read.table(text = 
"id    A    B    C    D    E    F
1 0.20 0.30 0.10 0.15 0.25 0.00 
2 0.05 0.10 0.05 0.30 0.10 0.40
3 0.10 0.10 0.10 0.20 0.20 0.30",
header = TRUE, stringsAsFactors = FALSE)

answered Oct 16 '22 20:10

MKR

My usual approach is to stick to base R as long as the data sets don't get too large. In your case, a base R solution would be:

class_df=as.data.frame(
  c(class_df,
    lapply(split(class_fg,class_fg$fg),
           function(x) rowSums(class_df[,x$class,drop=FALSE]))))
class_df
#  id    A   B    C    D    E   F   X   Y   Z
#1  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

If the data sets get too large, I use data.table. A data.table solution for your problem:

library(data.table)

class_dt=data.table(class_df)
grps=split(class_fg,class_fg$fg)

for (g in grps) class_dt[,c(g$fg[1]):=rowSums(.SD),.SDcols=g$class,]
class_dt
#   id    A   B    C    D    E   F   X   Y   Z
#1:  1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2:  2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3:  3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3

answered Oct 16 '22 22:10

cryo111

Related questions
                            
                                Rscript in silent mode
                            
                                How to get empty last elements from strsplit() in R?
                            
                                How to replace multiple strings with the same in R
                            
                                installed directory not writable, cannot update packages 'boot', 'class', 'KernSmooth', 'mgcv', 'nnet', 'rpart', 'spatial' [duplicate]
                            
                                How to index character vector in file.path in R
                            
                                R: How to generate a noisy sine function
                            
                                How to specify camera perspective of 3d plotly chart in R?
                            
                                Why is the time complexity of this loop non-linear?
                            
                                Does R `unique` always return values in same order?
                            
                                Splitting Columns by Number of Characters [duplicate]
                            
                                How to apply summarise_each to all columns except one? [duplicate]
                            
                                How to deal with a 50GB large csv file in r language?
                            
                                title in tab of browser empty in shiny dashboard page
                            
                                Create Grid in R for kriging in gstat
                            
                                weighted mean in dplyr for multiple columns
                            
                                Split vector separated by n zeros into different group
                            
                                Exact match with grepl R
                            
                                how to draw two half circles in ggplot in r
                            
                                Filter two tables with crosstalk
                            
                                R data.table sum of group subset using dates

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With