An incredibly common operation for my type of data is applying a normalisation factor to all columns. This can be done efficiently using <code>sweep</code> or <code>scale</code>: <pre class="prettyprint"><code>normalized = scale(data, center = FALSE, scale = factors) # or normalized = sweep(data, 2, factors, `/`) </code></pre> Where <pre class="prettyprint"><code>data = structure(list(A = c(3L, 174L, 6L, 1377L, 537L, 173L), B = c(1L, 128L, 2L, 1019L, 424L, 139L), C = c(3L, 66L, 2L, 250L, 129L, 40L), D = c(4L, 57L, 4L, 251L, 124L, 38L)), .Names = c("A", "B", "C", "D"), class = c("tbl_df", "data.frame"), row.names = c(NA, -6L)) factors = c(A = 1, B = 1.2, C = 0.8, D = 0.75) </code></pre> However, how do I do this with dplyr, when my data has additional columns in front? I can do it in separate statements, but I’d like doing it in one pipeline. This is my data: <pre class="prettyprint"><code>data = structure(list(ID = c(1, 2, 3, 4, 5, 6), Type = c("X", "X", "X", "Y", "Y", "Y"), A = c(3L, 174L, 6L, 1377L, 537L, 173L), B = c(1L, 128L, 2L, 1019L, 424L, 139L), C = c(3L, 66L, 2L, 250L, 129L, 40L), D = c(4L, 57L, 4L, 251L, 124L, 38L)), .Names = c("ID", "Type", "A", "B", "C", "D"), class = c("tbl_df", "data.frame"), row.names = c(NA, -6L)) </code></pre> And I’d like to mutate the data columns without touching the first two columns. Normally I can do this with <code>mutate_each</code>; however, how I cannot pass my normalisation factors to that function: <pre class="prettyprint"><code>data %>% mutate_each(funs(. / factors), A:D) </code></pre> This, unsurprisingly, assumes that I want to divide each column by <code>factors</code>, rather than each column by its matching factor.

From <code>dplyr 1.0.0</code>, you can do: <pre class="prettyprint"><code>data %>% rowwise() %>% mutate(across(A:D)/factors) ID Type A B C D <dbl> <chr> <dbl> <dbl> <dbl> <dbl> 1 1 X 3 0.833 3.75 5.33 2 2 X 174 107. 82.5 76 3 3 X 6 1.67 2.5 5.33 4 4 Y 1377 849. 312. 335. 5 5 Y 537 353. 161. 165. 6 6 Y 173 116. 50 50.7 </code></pre>

How do I sweep specific columns with dplyr?

Tags:

r

dplyr

An incredibly common operation for my type of data is applying a normalisation factor to all columns. This can be done efficiently using sweep or scale:

normalized = scale(data, center = FALSE, scale = factors)
# or
normalized = sweep(data, 2, factors, `/`)

Where

data = structure(list(A = c(3L, 174L, 6L, 1377L, 537L, 173L),
    B = c(1L, 128L, 2L, 1019L, 424L, 139L),
    C = c(3L, 66L, 2L, 250L, 129L, 40L),
    D = c(4L, 57L, 4L, 251L, 124L, 38L)),
    .Names = c("A", "B", "C", "D"),
    class = c("tbl_df", "data.frame"), row.names = c(NA, -6L))

factors = c(A = 1, B = 1.2, C = 0.8, D = 0.75)

However, how do I do this with dplyr, when my data has additional columns in front? I can do it in separate statements, but I’d like doing it in one pipeline. This is my data:

data = structure(list(ID = c(1, 2, 3, 4, 5, 6),
    Type = c("X", "X", "X", "Y", "Y", "Y"),
    A = c(3L, 174L, 6L, 1377L, 537L, 173L),
    B = c(1L, 128L, 2L, 1019L, 424L, 139L),
    C = c(3L, 66L, 2L, 250L, 129L, 40L),
    D = c(4L, 57L, 4L, 251L, 124L, 38L)),
    .Names = c("ID", "Type", "A", "B", "C", "D"),
    class = c("tbl_df", "data.frame"), row.names = c(NA, -6L))

And I’d like to mutate the data columns without touching the first two columns. Normally I can do this with mutate_each; however, how I cannot pass my normalisation factors to that function:

data %>% mutate_each(funs(. / factors), A:D)

This, unsurprisingly, assumes that I want to divide each column by factors, rather than each column by its matching factor.

308

asked Feb 03 '15 12:02

Konrad Rudolph

2 Answers

Given akrun's encouragement, let me post what I did as an answer here. I just intuitively thought that you might want to ask R to indicate columns with a same name to do this mutate_each. For instance, if . indicates the column, A, I thought another column named A from another data.frame might be something dplyr might like. So, I created a data frame for factors then used mutate_each. It seems that the outcome is right. Since I have no technical background, I am afraid that I cannot really provide any explanation. I hope you do not mind that.

factors <- data.frame(A = 1, B = 1.2, C = 0.8, D = 0.75)

mutate_at(data, vars(A:D), funs(. / foo$.))

# By the time I answered this question, the following was working.
# But mutate_each() is now deprecated.

# mutate_each(data, funs(. / factors$.), A:D)

#  ID Type    A           B      C          D
#1  1    X    3   0.8333333   3.75   5.333333
#2  2    X  174 106.6666667  82.50  76.000000
#3  3    X    6   1.6666667   2.50   5.333333
#4  4    Y 1377 849.1666667 312.50 334.666667
#5  5    Y  537 353.3333333 161.25 165.333333
#6  6    Y  173 115.8333333  50.00  50.666667

EDIT

This also works. Given data frame is a special case of list, this is not perhaps surprising.

# Experiment
foo <- list(A = 1, B = 1.2, C = 0.8, D = 0.75)

mutate_at(data, vars(A:D), funs(. / foo$.))

# mutate_each(data, funs(. / foo$.), A:D)

#  ID Type    A           B      C          D
#1  1    X    3   0.8333333   3.75   5.333333
#2  2    X  174 106.6666667  82.50  76.000000
#3  3    X    6   1.6666667   2.50   5.333333
#4  4    Y 1377 849.1666667 312.50 334.666667
#5  5    Y  537 353.3333333 161.25 165.333333
#6  6    Y  173 115.8333333  50.00  50.666667

answered Oct 23 '22 01:10

jazzurro

From dplyr 1.0.0, you can do:

data %>%
 rowwise() %>%
 mutate(across(A:D)/factors)

     ID Type      A       B      C      D
  <dbl> <chr> <dbl>   <dbl>  <dbl>  <dbl>
1     1 X         3   0.833   3.75   5.33
2     2 X       174 107.     82.5   76   
3     3 X         6   1.67    2.5    5.33
4     4 Y      1377 849.    312.   335.  
5     5 Y       537 353.    161.   165.  
6     6 Y       173 116.     50     50.7

answered Oct 23 '22 01:10

tmfmnk

Related questions
                            
                                Reduce memory footprint of data.table with highly repeated key
                            
                                Creating a regular polygon grid over a spatial extent, rotated by a given angle
                            
                                how do I end a dplyr pipe with NULL? to allow easy comment/uncomment
                            
                                Set one or more of coefficients to a specific integer
                            
                                How to change knitr options mid chunk
                            
                                'x' is a list, but does not have components 'x' and 'y'
                            
                                Sum percentages for each facet - respect "fill"
                            
                                How can I efficiently save a python pandas dataframe in hdf5 and open it as a dataframe in R?
                            
                                53rd week of the year in R?
                            
                                Specifying column with its index rather than name
                            
                                Width of error bars in line plot using ggplot2
                            
                                Unicode normalization (form C) in R : convert all characters with accents into their one-unicode-character form?
                            
                                Combine two lists of dataframes, dataframe by dataframe
                            
                                RDA, Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric, when data is numeric?
                            
                                What affects the time to create a cluster using the parallel package?
                            
                                Complete partially filled in columns, based on established relationships between columns
                            
                                Reload .Renviron or .Rprofile from an active R session (without restarting R)?
                            
                                Remove border lines in ggplot map/choropleth
                            
                                What is the purpose of .*\\?
                            
                                How to configure R-3.1.2 with --enable-R-shlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With