Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I sweep specific columns with dplyr?

Tags:

r

dplyr

An incredibly common operation for my type of data is applying a normalisation factor to all columns. This can be done efficiently using sweep or scale:

normalized = scale(data, center = FALSE, scale = factors)
# or
normalized = sweep(data, 2, factors, `/`)

Where

data = structure(list(A = c(3L, 174L, 6L, 1377L, 537L, 173L),
    B = c(1L, 128L, 2L, 1019L, 424L, 139L),
    C = c(3L, 66L, 2L, 250L, 129L, 40L),
    D = c(4L, 57L, 4L, 251L, 124L, 38L)),
    .Names = c("A", "B", "C", "D"),
    class = c("tbl_df", "data.frame"), row.names = c(NA, -6L))

factors = c(A = 1, B = 1.2, C = 0.8, D = 0.75)

However, how do I do this with dplyr, when my data has additional columns in front? I can do it in separate statements, but I’d like doing it in one pipeline. This is my data:

data = structure(list(ID = c(1, 2, 3, 4, 5, 6),
    Type = c("X", "X", "X", "Y", "Y", "Y"),
    A = c(3L, 174L, 6L, 1377L, 537L, 173L),
    B = c(1L, 128L, 2L, 1019L, 424L, 139L),
    C = c(3L, 66L, 2L, 250L, 129L, 40L),
    D = c(4L, 57L, 4L, 251L, 124L, 38L)),
    .Names = c("ID", "Type", "A", "B", "C", "D"),
    class = c("tbl_df", "data.frame"), row.names = c(NA, -6L))

And I’d like to mutate the data columns without touching the first two columns. Normally I can do this with mutate_each; however, how I cannot pass my normalisation factors to that function:

data %>% mutate_each(funs(. / factors), A:D)

This, unsurprisingly, assumes that I want to divide each column by factors, rather than each column by its matching factor.

like image 308
Konrad Rudolph Avatar asked Feb 03 '15 12:02

Konrad Rudolph


People also ask

How do I grab certain columns in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.

How do I remove columns from dplyr in R?

dplyr select() function is used to select the column and by using negation of this to remove columns. All verbs in dplyr package take data.


2 Answers

Given akrun's encouragement, let me post what I did as an answer here. I just intuitively thought that you might want to ask R to indicate columns with a same name to do this mutate_each. For instance, if . indicates the column, A, I thought another column named A from another data.frame might be something dplyr might like. So, I created a data frame for factors then used mutate_each. It seems that the outcome is right. Since I have no technical background, I am afraid that I cannot really provide any explanation. I hope you do not mind that.

factors <- data.frame(A = 1, B = 1.2, C = 0.8, D = 0.75)

mutate_at(data, vars(A:D), funs(. / foo$.))

# By the time I answered this question, the following was working.
# But mutate_each() is now deprecated.

# mutate_each(data, funs(. / factors$.), A:D)

#  ID Type    A           B      C          D
#1  1    X    3   0.8333333   3.75   5.333333
#2  2    X  174 106.6666667  82.50  76.000000
#3  3    X    6   1.6666667   2.50   5.333333
#4  4    Y 1377 849.1666667 312.50 334.666667
#5  5    Y  537 353.3333333 161.25 165.333333
#6  6    Y  173 115.8333333  50.00  50.666667

EDIT

This also works. Given data frame is a special case of list, this is not perhaps surprising.

# Experiment
foo <- list(A = 1, B = 1.2, C = 0.8, D = 0.75)

mutate_at(data, vars(A:D), funs(. / foo$.))

# mutate_each(data, funs(. / foo$.), A:D)

#  ID Type    A           B      C          D
#1  1    X    3   0.8333333   3.75   5.333333
#2  2    X  174 106.6666667  82.50  76.000000
#3  3    X    6   1.6666667   2.50   5.333333
#4  4    Y 1377 849.1666667 312.50 334.666667
#5  5    Y  537 353.3333333 161.25 165.333333
#6  6    Y  173 115.8333333  50.00  50.666667
like image 63
jazzurro Avatar answered Oct 23 '22 01:10

jazzurro


From dplyr 1.0.0, you can do:

data %>%
 rowwise() %>%
 mutate(across(A:D)/factors)

     ID Type      A       B      C      D
  <dbl> <chr> <dbl>   <dbl>  <dbl>  <dbl>
1     1 X         3   0.833   3.75   5.33
2     2 X       174 107.     82.5   76   
3     3 X         6   1.67    2.5    5.33
4     4 Y      1377 849.    312.   335.  
5     5 Y       537 353.    161.   165.  
6     6 Y       173 116.     50     50.7 
like image 23
tmfmnk Avatar answered Oct 23 '22 01:10

tmfmnk