An incredibly common operation for my type of data is applying a normalisation factor to all columns. This can be done efficiently using sweep
or scale
:
normalized = scale(data, center = FALSE, scale = factors)
# or
normalized = sweep(data, 2, factors, `/`)
Where
data = structure(list(A = c(3L, 174L, 6L, 1377L, 537L, 173L),
B = c(1L, 128L, 2L, 1019L, 424L, 139L),
C = c(3L, 66L, 2L, 250L, 129L, 40L),
D = c(4L, 57L, 4L, 251L, 124L, 38L)),
.Names = c("A", "B", "C", "D"),
class = c("tbl_df", "data.frame"), row.names = c(NA, -6L))
factors = c(A = 1, B = 1.2, C = 0.8, D = 0.75)
However, how do I do this with dplyr, when my data has additional columns in front? I can do it in separate statements, but I’d like doing it in one pipeline. This is my data:
data = structure(list(ID = c(1, 2, 3, 4, 5, 6),
Type = c("X", "X", "X", "Y", "Y", "Y"),
A = c(3L, 174L, 6L, 1377L, 537L, 173L),
B = c(1L, 128L, 2L, 1019L, 424L, 139L),
C = c(3L, 66L, 2L, 250L, 129L, 40L),
D = c(4L, 57L, 4L, 251L, 124L, 38L)),
.Names = c("ID", "Type", "A", "B", "C", "D"),
class = c("tbl_df", "data.frame"), row.names = c(NA, -6L))
And I’d like to mutate the data columns without touching the first two columns. Normally I can do this with mutate_each
; however, how I cannot pass my normalisation factors to that function:
data %>% mutate_each(funs(. / factors), A:D)
This, unsurprisingly, assumes that I want to divide each column by factors
, rather than each column by its matching factor.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
dplyr select() function is used to select the column and by using negation of this to remove columns. All verbs in dplyr package take data.
Given akrun's encouragement, let me post what I did as an answer here. I just intuitively thought that you might want to ask R to indicate columns with a same name to do this mutate_each
. For instance, if .
indicates the column, A
, I thought another column named A
from another data.frame might be something dplyr
might like. So, I created a data frame for factors
then used mutate_each
. It seems that the outcome is right. Since I have no technical background, I am afraid that I cannot really provide any explanation. I hope you do not mind that.
factors <- data.frame(A = 1, B = 1.2, C = 0.8, D = 0.75)
mutate_at(data, vars(A:D), funs(. / foo$.))
# By the time I answered this question, the following was working.
# But mutate_each() is now deprecated.
# mutate_each(data, funs(. / factors$.), A:D)
# ID Type A B C D
#1 1 X 3 0.8333333 3.75 5.333333
#2 2 X 174 106.6666667 82.50 76.000000
#3 3 X 6 1.6666667 2.50 5.333333
#4 4 Y 1377 849.1666667 312.50 334.666667
#5 5 Y 537 353.3333333 161.25 165.333333
#6 6 Y 173 115.8333333 50.00 50.666667
EDIT
This also works. Given data frame is a special case of list, this is not perhaps surprising.
# Experiment
foo <- list(A = 1, B = 1.2, C = 0.8, D = 0.75)
mutate_at(data, vars(A:D), funs(. / foo$.))
# mutate_each(data, funs(. / foo$.), A:D)
# ID Type A B C D
#1 1 X 3 0.8333333 3.75 5.333333
#2 2 X 174 106.6666667 82.50 76.000000
#3 3 X 6 1.6666667 2.50 5.333333
#4 4 Y 1377 849.1666667 312.50 334.666667
#5 5 Y 537 353.3333333 161.25 165.333333
#6 6 Y 173 115.8333333 50.00 50.666667
From dplyr 1.0.0
, you can do:
data %>%
rowwise() %>%
mutate(across(A:D)/factors)
ID Type A B C D
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 X 3 0.833 3.75 5.33
2 2 X 174 107. 82.5 76
3 3 X 6 1.67 2.5 5.33
4 4 Y 1377 849. 312. 335.
5 5 Y 537 353. 161. 165.
6 6 Y 173 116. 50 50.7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With