How can I mutate all columns containing a pattern (mutate_at
I guess) using each time the previous column using dplyr
?
--> Here for example all column continaing foo
in their name should be mutated using the column just before (i.e., a
for column fooa
, b
for foob
and so on).
set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
fooa = runif(dfrows),
b = rnorm(dfrows, mean=50, sd=5),
foob = runif(dfrows, min=0, max=5),
c = rnorm(dfrows, mean=100, sd=10),
fooc = runif(dfrows, min=0, max=10))
df
# a fooa b foob c fooc
# 1 0.5543269 0.6611216 48.26791 3.0999527 98.06053 6.035485
# 2 -0.2802719 0.8783709 51.15647 0.1586242 113.96432 2.299504
# 3 1.7751634 0.8905590 52.34582 2.3070636 101.00663 9.668332
# 4 0.1873201 0.5662805 50.58978 1.6501046 98.85561 6.045547
# 5 1.1425261 0.5935473 50.35224 3.1676038 107.02225 6.396047
library(dplyr)
df %>% mutate(fooa = fooa/100 * a,
foob = foob/100 * b,
fooc = fooc/100 * c)
# a fooa b foob c fooc
# 1 0.5543269 0.003664775 48.26791 1.49628246 98.06053 5.918428
# 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
# 3 1.7751634 0.015808878 52.34582 1.20765132 101.00663 9.765657
# 4 0.1873201 0.001060757 50.58978 0.83478430 98.85561 5.976363
# 5 1.1425261 0.006781434 50.35224 1.59495949 107.02225 6.845194
# Equivalently, in base R:
for (i in c(2, 4, 6)) {
df[,i] = df[,i]/100 * df[, i-1]
}
So I am looking for something like this I guess:
# What should <PREVIOUS_COLUMN> be?
df %>% mutate_at(vars(contains('foo')), funs(./100 * <PREVIOUS_COLUMN>))
# OR, even better (more generic but in my case it will always be the previous column):
df %>% mutate_at(vars(contains('foo')), funs(./100 * <COLUMN_NAME_WITH_'foo'_PATTERN_REMOVED>))
EDIT: I should have mentioned that the original data.frame
could contain more columns, possibly with another pattern than X then fooX
, so that the ideal solution should be able to localize them properly (but I'll leave it as such as all answers provide nice solutions and features).
A better example would have been:
set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
fooa = runif(dfrows),
b = rnorm(dfrows, mean=50, sd=5),
foob = runif(dfrows, min=0, max=5),
bla = 5,
c = rnorm(dfrows, mean=100, sd=10),
fooc = runif(dfrows, min=0, max=10),
blo = 8)
df
# a fooa b foob bla c fooc blo
# 1 0.5543269 0.6611216 48.26791 3.0999527 5 98.06053 6.035485 8
# 2 -0.2802719 0.8783709 51.15647 0.1586242 5 113.96432 2.299504 8
# 3 1.7751634 0.8905590 52.34582 2.3070636 5 101.00663 9.668332 8
# 4 0.1873201 0.5662805 50.58978 1.6501046 5 98.85561 6.045547 8
# 5 1.1425261 0.5935473 50.35224 3.1676038 5 107.02225 6.396047 8
This is actually vectorised operation so you can do this without any loops.
Using vector recycling :
df[, c(FALSE, TRUE)] <- df[, c(FALSE, TRUE)]/100 * df[, c(TRUE, FALSE)]
df
# a fooa b foob c fooc
#1 0.5543269 0.003664775 48.26791 1.49628246 98.06053 5.918428
#2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
#3 1.7751634 0.015808878 52.34582 1.20765132 101.00663 9.765657
#4 0.1873201 0.001060757 50.58978 0.83478430 98.85561 5.976363
#5 1.1425261 0.006781434 50.35224 1.59495949 107.02225 6.845194
If you want to perform this based on names of the column.
cols <- grep('foo', names(df))
df[cols] <- df[cols]/100 * df[-cols]
Here is another approach using across()
and cur_column()
. I personally would not recommend to make calculations based on the position of columns, and would instead recommend to work with the columns names, since this seems safer.
In the example below we loop over the columns a
,b
and c
with across
and access the values of each corresponding foo
column by using get()
and cur_column
.
set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
fooa = runif(dfrows),
b = rnorm(dfrows, mean=50, sd=5),
foob = runif(dfrows, min=0, max=5),
c = rnorm(dfrows, mean=100, sd=10),
fooc = runif(dfrows, min=0, max=10))
library(dplyr)
df %>%
mutate(across(matches("^[a-z]$"),
~ get(paste0("foo", cur_column())) / 100 * .x,
.names = "foo{col}"))
#> a fooa b foob c fooc
#> 1 0.5543269 0.003664775 48.26791 1.49628246 98.06053 5.918428
#> 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
#> 3 1.7751634 0.015808878 52.34582 1.20765132 101.00663 9.765657
#> 4 0.1873201 0.001060757 50.58978 0.83478430 98.85561 5.976363
#> 5 1.1425261 0.006781434 50.35224 1.59495949 107.02225 6.845194
Created on 2021-01-27 by the reprex package (v0.3.0)
One option could be:
df %>%
mutate(across(starts_with("foo"))/100 * across(!matches("foo")))
a fooa b foob c fooc
1 0.5543269 0.003664775 48.26791 1.49628246 98.06053 5.918428
2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
3 1.7751634 0.015808878 52.34582 1.20765132 101.00663 9.765657
4 0.1873201 0.001060757 50.58978 0.83478430 98.85561 5.976363
5 1.1425261 0.006781434 50.35224 1.59495949 107.02225 6.845194
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With