This is somehow related to this question:
In principle I try to understand how rowwise
operations with mutate
across multiple columns applying more then 1 functions like (mean()
, sum()
, min()
etc..) work.
I have learned that across
does this job and not c_across
.
I have learned that the function mean()
is different to the function min()
in that way that mean()
doesn't work on dataframes and we need to change it to vector which can be done with unlist or as.matrix -> learned from Ronak Shah hereUnderstanding rowwise() and c_across()
Now with my actual case: I was able to do this task but I loose one column d
. How can I avoid the loose of the column d
in this setting.
My df:
df <- structure(list(a = 1:5, b = 6:10, c = 11:15, d = c("a", "b",
"c", "d", "e"), e = 1:5), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
Works not:
df %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
# Output:
a b c d e avg min max
<int> <int> <int> <chr> <int> <dbl> <chr> <chr>
1 1 6 11 a 1 NA 1 a
2 2 7 12 b 2 NA 12 b
3 3 8 13 c 3 NA 13 c
4 4 9 14 d 4 NA 14 d
5 5 10 15 e 5 NA 10 e
Works, but I loose column d
:
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
avg = mean(unlist(cur_data()), na.rm = TRUE),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE)
)
a b c e avg min max
<int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1 6 11 1 4.75 1 11
2 2 7 12 2 5.75 2 12
3 3 8 13 3 6.75 3 13
4 4 9 14 4 7.75 4 14
5 5 10 15 5 8.75 5 15
rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.
Apply any function to all R data frame You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame. If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed. The output is of class “matrix” instead of “data.
Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
Using pmap()
from purrr
might be more preferable since you need to select the data just once and you can use the select helpers:
df %>%
mutate(pmap_dfr(across(where(is.numeric)),
~ data.frame(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...)))))
a b c d e max min avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 11 1 4.75
2 2 7 12 b 2 12 2 5.75
3 3 8 13 c 3 13 3 6.75
4 4 9 14 d 4 14 4 7.75
5 5 10 15 e 5 15 5 8.75
Or with the addition of tidyr
:
df %>%
mutate(res = pmap(across(where(is.numeric)),
~ list(max = max(c(...)),
min = min(c(...)),
avg = mean(c(...))))) %>%
unnest_wider(res)
Edit:
Best way out here
df %>%
rowwise() %>%
mutate(min = min(c_across(a:e & where(is.numeric)), na.rm = TRUE),
max = max(c_across(a:e & where(is.numeric)), na.rm = TRUE),
avg = mean(c_across(a:e & where(is.numeric)), na.rm = TRUE)
)
# A tibble: 5 x 8
# Rowwise:
a b c d e min max avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 1 11 4.75
2 2 7 12 b 2 2 12 5.75
3 3 8 13 c 3 3 13 6.75
4 4 9 14 d 4 4 14 7.75
5 5 10 15 e 5 5 15 8.75
Earlier Answer
Your this will work
won't even work properly, if you change the output sequence, see
df %>%
select(-d) %>%
rowwise() %>%
mutate(across(a:e),
min = min(unlist(cur_data()), na.rm = TRUE),
max = max(unlist(cur_data()), na.rm = TRUE),
avg = mean(unlist(cur_data()), na.rm = TRUE)
)
# A tibble: 5 x 7
# Rowwise:
a b c e min max avg
<int> <int> <int> <int> <int> <int> <dbl>
1 1 6 11 1 1 11 5.17
2 2 7 12 2 2 12 6.17
3 3 8 13 3 3 13 7.17
4 4 9 14 4 4 14 8.17
5 5 10 15 5 5 15 9.17
Therefore, it is advised to do it like this-
df %>%
select(-d) %>%
rowwise() %>%
mutate(min = min(c_across(a:e), na.rm = TRUE),
max = max(c_across(a:e), na.rm = TRUE),
avg = mean(c_across(a:e), na.rm = TRUE)
)
# A tibble: 5 x 7
# Rowwise:
a b c e min max avg
<int> <int> <int> <int> <int> <int> <dbl>
1 1 6 11 1 1 11 4.75
2 2 7 12 2 2 12 5.75
3 3 8 13 3 3 13 6.75
4 4 9 14 4 4 14 7.75
5 5 10 15 5 5 15 8.75
One more alternative is
cols <- c('a', 'b', 'c', 'e')
df %>%
rowwise() %>%
mutate(min = min(c_across(cols), na.rm = TRUE),
max = max(c_across(cols), na.rm = TRUE),
avg = mean(c_across(cols), na.rm = TRUE)
)
# A tibble: 5 x 8
# Rowwise:
a b c d e min max avg
<int> <int> <int> <chr> <int> <int> <int> <dbl>
1 1 6 11 a 1 1 11 4.75
2 2 7 12 b 2 2 12 5.75
3 3 8 13 c 3 3 13 6.75
4 4 9 14 d 4 4 14 7.75
5 5 10 15 e 5 5 15 8.75
Even @Sinh suggested approach of group_by won't work properly in these cases.
Here is one method which would preserve the data.frame
attribute in mutate
if we want to set a particular column to row name attribute (column_to_rownames
) and then return the attribute after the transformation
library(dplyr)
library(tibble)
library(purrr)
df %>%
column_to_rownames('d') %>%
mutate(max = reduce(., pmax), min = reduce(., pmin),
avg = rowMeans(.)) %>%
rownames_to_column('d')
# d a b c e max min avg
#1 a 1 6 11 1 11 1 4.75
#2 b 2 7 12 2 12 2 5.75
#3 c 3 8 13 3 13 3 6.75
#4 d 4 9 14 4 14 4 7.75
#5 e 5 10 15 5 15 5 8.75
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With