dplyr mutate/replace several columns on a subset of rows

Tags:

I'm in the process of trying out a dplyr-based workflow (rather than using mostly data.table, which I'm used to), and I've come across a problem that I can't find an equivalent dplyr solution to. I commonly run into the scenario where I need to conditionally update/replace several columns based on a single condition. Here's some example code, with my data.table solution:

library(data.table)  # Create some sample data set.seed(1) dt <- data.table(site = sample(1:6, 50, replace=T),                  space = sample(1:4, 50, replace=T),                  measure = sample(c('cfl', 'led', 'linear', 'exit'), 50,                                 replace=T),                  qty = round(runif(50) * 30),                  qty.exit = 0,                  delta.watts = sample(10.5:100.5, 50, replace=T),                  cf = runif(50))  # Replace the values of several columns for rows where measure is "exit" dt <- dt[measure == 'exit',           `:=`(qty.exit = qty,               cf = 0,               delta.watts = 13)]

Is there a simple dplyr solution to this same problem? I'd like to avoid using ifelse because I don't want to have to type the condition multiple times - this is a simplified example, but there are sometimes many assignments based on a single condition.

Thanks in advance for the help!

383

asked Dec 04 '15 19:12

Chris Newton

2 Answers

These solutions (1) maintain the pipeline, (2) do not overwrite the input and (3) only require that the condition be specified once:

1a) mutate_cond Create a simple function for data frames or data tables that can be incorporated into pipelines. This function is like mutate but only acts on the rows satisfying the condition:

mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {   condition <- eval(substitute(condition), .data, envir)   .data[condition, ] <- .data[condition, ] %>% mutate(...)   .data }  DF %>% mutate_cond(measure == 'exit', qty.exit = qty, cf = 0, delta.watts = 13)

1b) mutate_last This is an alternative function for data frames or data tables which again is like mutate but is only used within group_by (as in the example below) and only operates on the last group rather than every group. Note that TRUE > FALSE so if group_by specifies a condition then mutate_last will only operate on rows satisfying that condition.

mutate_last <- function(.data, ...) {   n <- n_groups(.data)   indices <- attr(.data, "indices")[[n]] + 1   .data[indices, ] <- .data[indices, ] %>% mutate(...)   .data }   DF %>%     group_by(is.exit = measure == 'exit') %>%    mutate_last(qty.exit = qty, cf = 0, delta.watts = 13) %>%    ungroup() %>%    select(-is.exit)

2) factor out condition Factor out the condition by making it an extra column which is later removed. Then use ifelse, replace or arithmetic with logicals as illustrated. This also works for data tables.

library(dplyr)  DF %>% mutate(is.exit = measure == 'exit',               qty.exit = ifelse(is.exit, qty, qty.exit),               cf = (!is.exit) * cf,               delta.watts = replace(delta.watts, is.exit, 13)) %>%        select(-is.exit)

3) sqldf We could use SQL update via the sqldf package in the pipeline for data frames (but not data tables unless we convert them -- this may represent a bug in dplyr. See dplyr issue 1579). It may seem that we are undesirably modifying the input in this code due to the existence of the update but in fact the update is acting on a copy of the input in the temporarily generated database and not on the actual input.

library(sqldf)  DF %>%     do(sqldf(c("update '.'                   set 'qty.exit' = qty, cf = 0, 'delta.watts' = 13                   where measure = 'exit'",                "select * from '.'")))

4) row_case_when Also check out row_case_when defined in Returning a tibble: how to vectorize with case_when? . It uses a syntax similar to case_when but applies to rows.

library(dplyr)  DF %>%   row_case_when(     measure == "exit" ~ data.frame(qty.exit = qty, cf = 0, delta.watts = 13),     TRUE ~ data.frame(qty.exit, cf, delta.watts)   )

Note 1: We used this as DF

set.seed(1) DF <- data.frame(site = sample(1:6, 50, replace=T),                  space = sample(1:4, 50, replace=T),                  measure = sample(c('cfl', 'led', 'linear', 'exit'), 50,                                 replace=T),                  qty = round(runif(50) * 30),                  qty.exit = 0,                  delta.watts = sample(10.5:100.5, 50, replace=T),                  cf = runif(50))

Note 2: The problem of how to easily specify updating a subset of rows is also discussed in dplyr issues 134, 631, 1518 and 1573 with 631 being the main thread and 1573 being a review of the answers here.

187

answered Oct 18 '22 21:10

G. Grothendieck

You can do this with magrittr's two-way pipe %<>%:

library(dplyr) library(magrittr)  dt[dt$measure=="exit",] %<>% mutate(qty.exit = qty,                                     cf = 0,                                       delta.watts = 13)

This reduces the amount of typing, but is still much slower than data.table.

answered Oct 18 '22 21:10

eipi10

Related questions
                            
                                Select columns based on string match - dplyr::select
                            
                                Count the number of all words in a string
                            
                                How can I obtain an 'unbalanced' grid of ggplots?
                            
                                Save a plot in an object
                            
                                Assign multiple new variables on LHS in a single line
                            
                                How to randomize a vector
                            
                                R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph
                            
                                How to create a numeric vector of zero length in R
                            
                                Create sequence of repeated values, in sequence?
                            
                                Pass a string as variable name in dplyr::filter
                            
                                Apply a function to every specified column in a data.table and update by reference
                            
                                How to tell what is in one vector and not another?
                            
                                Arbitrary sections in roxygen docs
                            
                                Explain ggplot2 warning: "Removed k rows containing missing values"
                            
                                break/exit script
                            
                                What are 'user' and 'system' times measuring in R system.time(exp) output?
                            
                                Split a large dataframe into a list of data frames based on common value in column
                            
                                Select multiple elements from a list
                            
                                Why is message() a better choice than print() in R for writing a package?
                            
                                Skipping error in for-loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dplyr mutate/replace several columns on a subset of rows

Tags:

r

data.table

dplyr

Chris Newton

People also ask

2 Answers

G. Grothendieck

eipi10

Recent Activity

Donate For Us