Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine: rowwise(), mutate(), across(), for multiple functions

This is somehow related to this question: In principle I try to understand how rowwise operations with mutate across multiple columns applying more then 1 functions like (mean(), sum(), min() etc..) work.

I have learned that across does this job and not c_across. I have learned that the function mean() is different to the function min() in that way that mean() doesn't work on dataframes and we need to change it to vector which can be done with unlist or as.matrix -> learned from Ronak Shah hereUnderstanding rowwise() and c_across()

Now with my actual case: I was able to do this task but I loose one column d. How can I avoid the loose of the column d in this setting.

My df:

df <- structure(list(a = 1:5, b = 6:10, c = 11:15, d = c("a", "b", 
"c", "d", "e"), e = 1:5), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

Works not:

df %>% 
  rowwise() %>% 
  mutate(across(a:e), 
         avg = mean(unlist(cur_data()), na.rm = TRUE),
         min = min(unlist(cur_data()), na.rm = TRUE), 
         max = max(unlist(cur_data()), na.rm = TRUE)
  )

# Output:
      a     b     c d         e   avg min   max  
  <int> <int> <int> <chr> <int> <dbl> <chr> <chr>
1     1     6    11 a         1    NA 1     a    
2     2     7    12 b         2    NA 12    b    
3     3     8    13 c         3    NA 13    c    
4     4     9    14 d         4    NA 14    d    
5     5    10    15 e         5    NA 10    e 

Works, but I loose column d:

df %>% 
  select(-d) %>% 
  rowwise() %>% 
  mutate(across(a:e), 
         avg = mean(unlist(cur_data()), na.rm = TRUE),
         min = min(unlist(cur_data()), na.rm = TRUE), 
         max = max(unlist(cur_data()), na.rm = TRUE)
  )

      a     b     c     e   avg   min   max
  <int> <int> <int> <int> <dbl> <dbl> <dbl>
1     1     6    11     1  4.75     1    11
2     2     7    12     2  5.75     2    12
3     3     8    13     3  6.75     3    13
4     4     9    14     4  7.75     4    14
5     5    10    15     5  8.75     5    15
like image 667
TarJae Avatar asked May 01 '21 14:05

TarJae


People also ask

What does rowwise () do in R?

rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.

How do I apply a function across all columns in R?

Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed. The output is of class “matrix” instead of “data.

How do I sum rows in Dplyr?

Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame.

How do I specify multiple columns in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.


Video Answer


3 Answers

Using pmap() from purrr might be more preferable since you need to select the data just once and you can use the select helpers:

df %>% 
 mutate(pmap_dfr(across(where(is.numeric)),
                 ~ data.frame(max = max(c(...)),
                              min = min(c(...)),
                              avg = mean(c(...)))))

      a     b     c d         e   max   min   avg
  <int> <int> <int> <chr> <int> <int> <int> <dbl>
1     1     6    11 a         1    11     1  4.75
2     2     7    12 b         2    12     2  5.75
3     3     8    13 c         3    13     3  6.75
4     4     9    14 d         4    14     4  7.75
5     5    10    15 e         5    15     5  8.75

Or with the addition of tidyr:

df %>% 
 mutate(res = pmap(across(where(is.numeric)),
                   ~ list(max = max(c(...)),
                          min = min(c(...)),
                          avg = mean(c(...))))) %>%
 unnest_wider(res)
like image 166
tmfmnk Avatar answered Oct 18 '22 04:10

tmfmnk


Edit:

Best way out here

df %>%
  rowwise() %>% 
  mutate(min = min(c_across(a:e & where(is.numeric)), na.rm = TRUE),
         max = max(c_across(a:e & where(is.numeric)), na.rm = TRUE), 
         avg = mean(c_across(a:e & where(is.numeric)), na.rm = TRUE)
  )

# A tibble: 5 x 8
# Rowwise: 
      a     b     c d         e   min   max   avg
  <int> <int> <int> <chr> <int> <int> <int> <dbl>
1     1     6    11 a         1     1    11  4.75
2     2     7    12 b         2     2    12  5.75
3     3     8    13 c         3     3    13  6.75
4     4     9    14 d         4     4    14  7.75
5     5    10    15 e         5     5    15  8.75

Earlier Answer Your this will work won't even work properly, if you change the output sequence, see

df %>% 
  select(-d) %>% 
  rowwise() %>% 
  mutate(across(a:e), 
         min = min(unlist(cur_data()), na.rm = TRUE),
         max = max(unlist(cur_data()), na.rm = TRUE), 
         avg = mean(unlist(cur_data()), na.rm = TRUE)
  )

# A tibble: 5 x 7
# Rowwise: 
      a     b     c     e   min   max   avg
  <int> <int> <int> <int> <int> <int> <dbl>
1     1     6    11     1     1    11  5.17
2     2     7    12     2     2    12  6.17
3     3     8    13     3     3    13  7.17
4     4     9    14     4     4    14  8.17
5     5    10    15     5     5    15  9.17

Therefore, it is advised to do it like this-

df %>% 
  select(-d) %>% 
  rowwise() %>% 
  mutate(min = min(c_across(a:e), na.rm = TRUE),
         max = max(c_across(a:e), na.rm = TRUE), 
         avg = mean(c_across(a:e), na.rm = TRUE)
  )

# A tibble: 5 x 7
# Rowwise: 
      a     b     c     e   min   max   avg
  <int> <int> <int> <int> <int> <int> <dbl>
1     1     6    11     1     1    11  4.75
2     2     7    12     2     2    12  5.75
3     3     8    13     3     3    13  6.75
4     4     9    14     4     4    14  7.75
5     5    10    15     5     5    15  8.75

One more alternative is

cols <- c('a', 'b', 'c', 'e')
df %>%
  rowwise() %>% 
  mutate(min = min(c_across(cols), na.rm = TRUE),
         max = max(c_across(cols), na.rm = TRUE), 
         avg = mean(c_across(cols), na.rm = TRUE)
  )

# A tibble: 5 x 8
# Rowwise: 
      a     b     c d         e   min   max   avg
  <int> <int> <int> <chr> <int> <int> <int> <dbl>
1     1     6    11 a         1     1    11  4.75
2     2     7    12 b         2     2    12  5.75
3     3     8    13 c         3     3    13  6.75
4     4     9    14 d         4     4    14  7.75
5     5    10    15 e         5     5    15  8.75

Even @Sinh suggested approach of group_by won't work properly in these cases.

like image 23
AnilGoyal Avatar answered Oct 18 '22 03:10

AnilGoyal


Here is one method which would preserve the data.frame attribute in mutate if we want to set a particular column to row name attribute (column_to_rownames) and then return the attribute after the transformation

library(dplyr)
library(tibble)
library(purrr)
df %>% 
   column_to_rownames('d') %>%
   mutate(max = reduce(., pmax), min = reduce(., pmin), 
         avg = rowMeans(.)) %>% 
   rownames_to_column('d')
#  d a  b  c e max min  avg
#1 a 1  6 11 1  11   1 4.75
#2 b 2  7 12 2  12   2 5.75
#3 c 3  8 13 3  13   3 6.75
#4 d 4  9 14 4  14   4 7.75
#5 e 5 10 15 5  15   5 8.75
like image 34
akrun Avatar answered Oct 18 '22 02:10

akrun