I will use the following data set to illustrate my questions:
my_df <- data.frame(
a = 1:10,
b = 10:1
)
colnames(my_df) <- c("a", "b")
Part 1
I use the mutate()
function to create two new variables in my data set and I would like to compute the row means of the two new columns inside the same mutate()
call. However, I would really like to be able to use the select()
helpers such as starts_with()
, ends_with()
or contains()
.
My first try:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(ends_with("2")))
)
Error in mutate_impl(.data, dots) :
Evaluation error: No tidyselect variables were registered.
I understand why there is an error - the select()
function is not given any .data
argument. So I change the code in...
... my second try by adding ".
" inside the select()
function:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(., ends_with("2")))
)
a b a_2 b_2 mean
1 1 10 1 100 NaN
2 2 9 4 81 NaN
3 3 8 9 64 NaN
4 4 7 16 49 NaN
5 5 6 25 36 NaN
6 6 5 36 25 NaN
7 7 4 49 16 NaN
8 8 3 64 9 NaN
9 9 2 81 4 NaN
10 10 1 100 1 NaN
The new problem after the second try is that the mean
column does not contain the mean of a_2
and b_2
as expected, but contains NaN
s only. After studying the code a bit, I understood the second problem. The added ".
" in the select()
function refers to the original my_df
data frame, which does not have the a_2
and b_2
columns. So it makes sense that NaN
s are produced because I am asking R
to compute the means of nonexistent values.
I then tried to use dplyr
functions such as current_vars()
to see if it would make a difference:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(current_vars(), ends_with("2")))
)
Error in mutate_impl(.data, dots) :
Evaluation error: Variable context not set.
However, this is obviously NOT the way to use this function. The solution is to simply add a second mutate()
function:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2
) %>%
mutate(mean = rowMeans(select(., ends_with("2"))))
a b a_2 b_2 mean
1 1 10 1 100 50.5
2 2 9 4 81 42.5
3 3 8 9 64 36.5
4 4 7 16 49 32.5
5 5 6 25 36 30.5
6 6 5 36 25 30.5
7 7 4 49 16 32.5
8 8 3 64 9 36.5
9 9 2 81 4 42.5
10 10 1 100 1 50.5
Question 1: Is there any way to perform this task in the same mutate()
call? Using a second mutate()
function is not really an issue anyway; however, I am curious to know if there exists a way to refer to currently existing variables. The mutate()
function allows for the usage of variables as soon as they are created inside the same mutate()
call; however, this becomes problematic when functions are nested as shown in my example above.
Part 2
I also realize that using rowMeans()
works in my solution; however, it is not really a dplyr
-way of doing things especially because I need to use select()
inside it. So, I decided to use the rowwise()
and mean()
functions instead. But once again, I would like to use one of the select()
helpers for that and not have to list all variables in a c()
function. I tried:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2
) %>%
rowwise() %>%
mutate(
mean = mean(ends_with("2"))
)
Error in mutate_impl(.data, dots) :
Evaluation error: No tidyselect variables were registered.
I suspect that the error in the code is due to the fact that ends_with()
is not inside select()
, but I am showing this to ask whether there is a way to list the variables I want without having to specify them individually.
Thank you for your time.
mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL .
In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.
rowwise.Rd. rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping. The exception is summarise() , which return a grouped_df.
This article will cover the five verbs of dplyr: select, filter, arrange, mutate, and summarize.
A bit late, but here is a solution to problem 1, for the reference.
If you had to do it without pipes, you would write:
tmp1 = mutate(my_df, a_2 = a^2, b_2 = b^2)
tmp2 = select(tmp1, ends_with("2"))
tmp3 = rowMeans(tmp2)
tmp4 = mutate(tmp1, m=tmp3)
Or, with less intermediate steps:
tmp1 = mutate(my_df, a_2 = a^2, b_2 = b^2)
tmp4 = mutate(tmp1, m=rowMeans(select(tmp1, ends_with("2"))) )
Note that computing tmp4
requires using tmp1
twice. So in the piped version you will need also to reference .
explicitly a second time (as usual the first reference is implicit, as the first argument to mutate):
my_df %>%
mutate(a_2 = a^2, b_2 = b^2) %>%
mutate(mean = rowMeans(select(., ends_with("2"))) )
For problem #2: avoiding the call rowMeans is trickier, and maybe not desirable (?)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With